Lobster

Source

https://github.com/matz-e/lobster.git

Goals

  • scalability
  • running in heterogeneous environments (condor, SGE, etc...)
  • reduce overhead on opportunistic jobs
    • caching cvmfs
    • caching sandbox
    • unpacking only once
    • checkpointing
  • scale job size in feedback loop
    • adjust runtime, n(files)
  • retain functionality from crab: publishing, dashboard + general monitoring, dataset retrieval

To Do (current iteration)

  • switch job creation to sql database with jobits
  • write interface to take jobits from sql database and feed to wq
  • write basic loop to submit them and evaluate them at the end: succeeded, finished but needs to be re-run, didn’t finish, lost
  • stage out: srmcp for now
  • publishing, adding output files, cleanup inputs
  • retrieve filenames from DBS [DONE]
  • sandboxing (minus git, CVS directories) [DONE]
  • touch a file [DONE]

To Do (long term)

  • randomize datasets to distribute load
  • add job priorities
  • Improved logging and error messages
  • JSON masking [DONE]
  • parrot wrapper (should be easy)
  • licensing

Issues to solve

  • weaver is build around the concept of files present or database queries
    • write a DBS/DAS provider?
    • might have to adjust/augment some methods in weaver
    • alternatively, write own makeflow generator (sounds potentially easier with more control over workflow)
  • to get a feedback loop/sandbox preservation, makeflow might have to be replaced by a direct workqueue implementation
    • is there a way to preserve the workqueue worker environment within makeflow?