NDCMS CERN

This page documents one particular usage of the ND T3. It consists of two stages: using crab to stage-out files to the T3; and running on those files at the T3. All user interaction can be performed from an lxplus server at CERN.


Stage 1: Skimming and stage-out to the T3

It is often useful to analyze a large sample of events hosted at a T2, and to write out to the ND T3 the subset of these events that pass some criteria. This is called skimming. In order to later run over the selected events with an analysis job, you must first publish your output as a new, personalized dataset. To run a crab task that skims a dataset, copies to output to the T3, and publishes the new data, you must do the following:

Have a working crab.cfg template.

Your crab.cfg must have the usual elements, like the datasetpath.

[CMSSW]
datasetpath=/YourFavoritePrimaryDataset/BlahBlah/

The best documentation for crab is [[1][here]]. From a CERN machine with AFS, you can view an annotated crab.cfg file at:

/afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_2_7_7_patch1/python/crab.cfg

Have a working CMSSW skim configuration file.

This file must be listed in the [CMSSW] block of your crab.cfg

pset=mySkim_cfg.py

Remember to always test your code locally before submitting a grid job with crab. Recall that crab identifies all the unique code in your local release. This code is sent out to the grid as part of the analysis code to be run. Thus, your code must work locally before submitting crab jobs. A good option is to run on Release Validation (relval) samples on castor (/castor/cern.ch/cms/store/relval).


Set up your crab.cfg for stage-out to the T3 and publication.

[USER]
copy_data=1
storage_element=T3_US_NotreDame
publish_data_name=yourProcessingName
dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_01_writer/servlet/DBSServlet
publish_data=1
return_data=0

The publication DBS can be the one listed here.

Next, run your crab jobs. When the jobs have successfully run, use 'crab -publish' to publish the dataset to the DBS instance. The dataset name will have the form:

/YourFavoritePrimaryDataset/yourProcessingName-<hashedName>/USER

Publishing note: Job output must be retrieved ('crab -get') before the corresponding files can be published as part of a dataset. You do not have to wait for all jobs to finish before publishing. Only the successfully completed jobs will be published. Running 'crab -publish' again will then publish only those jobs which have finished since the last publishing.


Stage 2: Running jobs at the T3

Now that you have a skimmed dataset at the ND T3, you can run analysis jobs on those data, using the CPUs at the T3. You will need to do the following:

Have a working analysis configuration

Always test this code locally first!

Have an appropriate crab configuration file

Apart from the usual elements, you will need to set the following:

[CMSSW]
datasetpath=/YourFavoritePrimaryDataset/yourProcessingName-<hashedName>/USER
dbs_url=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_01_writer/servlet/DBSServlet
pset=yourCMSSWConfig_cfg.py

If you wish to write your output back to castor, do the following:

[USER]
return_data = 0
copy_data = 1
storage_element = srm-cms.cern.ch
storage_path = /srm/managerv2?SFN=/castor/cern.ch
user_remote_dir = /user/<yourInitial>/<yourUserName>/<yourPreferredDirectory>

Before running, you must create the directory

rfmkdir user/<yourInitial>/<yourUserName>/<yourPreferredDirectory>

and make it at least group writable

 rfchmod 775 user/<yourInitial>/<yourUserName>/<yourPreferredDirectory>