WE2E package
Submodules
WE2E.WE2E_summary module
- setup_logging(debug: bool = False) None
Sets up logging to print high-priority (INFO and higher) messages to the console and to print all messages with detailed timing and routine info to the specified text file.
- Parameters:
debug (bool) – Set to True to print more verbose output to the console
- Returns:
None
WE2E.monitor_jobs module
- monitor_jobs(expts_dict: dict, monitor_file: str = '', procs: int = 1, mode: str = 'continuous', delay: int = 5, debug: bool = False) str
Monitors and runs jobs for the specified experiment using Rocoto
- Parameters:
expts_dict (dict) – A dictionary containing the information needed to run one or more experiments. See example file
monitor_jobs.yaml.monitor_file (str) – [optional] Name of the file used to monitor experiment results. Default is
monitor_jobs.yaml.procs (int) – [optional] The number of parallel processes to run
mode (str) – [optional] Mode of job monitoring. Options: (1)
'continuous'(default): monitor jobs continuously until complete or (2)'advance': increment jobs once, then quit.delay (int) – [optional] Delay in seconds between calls to rocotorun. Very large experiments may result in system slowdowns if this value is too low.
debug (bool) – [optional] Enable extra output for debugging
- Returns:
monitor_file – The name of the file used for job monitoring (when script is finished, this contains results/summary)
WE2E.print_test_info module
Script for parsing all test files in the test_configs directory and printing a pipe-delimited summary file of the details of each test.
WE2E.run_we2e_tests module
Run and monitor WE2E tests.
- check_task_get_extrn_bcs(cfg: dict, mach: dict, dflt: dict, ics_or_lbcs: str = '') dict
Checks and updates various settings in the
task_get_extrn_icsortask_get_extrn_lbcssection of the test’s configuration YAML file- Parameters:
cfg (dict) – Contents loaded from test configuration file
mach (dict) – Contents loaded from machine settings file
dflt (dict) – Contents loaded from default configuration file (
config_defaults.yaml)ics_or_lbcs (str) – Perform checks for either the ICs task or the LBCs task. Valid values:
"ics"|"lbcs"
- Returns:
cfg_bcs – Updated dictionary for
task_get_extrn_[ics|lbcs]section of test configuration file
- check_test(test: str) str
Checks that a string corresponds to a valid test name
- Parameters:
test (str) – Potential test name
- Returns:
config – Name of the test configuration file (empty string if no test file is found)
- check_tests(tests: list) list
Checks that all tests in a provided list of tests are valid
- Parameters:
tests (list) – List of potentially valid test names
- Returns:
tests_to_run – List of configuration files corresponding to test names
- run_we2e_tests(homedir, args) None
Runs the Workflow End-to-End (WE2E) tests selected by the user
- Parameters:
homedir (str) – The full path to the top-level application directory
args (argparse.Namespace) – Command-line arguments
- Returns:
None
WE2E.utils module
A collection of utilities used by the various WE2E scripts
- calculate_core_hours(expts_dict: dict) dict
Takes in an experiment dictionary, reads the
var_defns.shfile for necessary information, and calculates the core hours used by each task, updatingexpts_dictwith this information- Parameters:
expts_dict (dict) – The information needed to run one or more experiments. See example file
WE2E_tests.yaml- Returns:
expts_dict – Experiment dictionary updated with core hours
- compare_rocotostat(expt_dict, name)
Reads the dictionary showing the location of a given experiment, runs a
rocotostatcommand to get the full set of tasks for the experiment, and compares the two to see if there are any unsubmitted tasks remaining.
- create_expts_dict(expt_dir: str, delay: int)
Takes in a directory, searches that directory for subdirectories containing experiments, and creates a skeleton dictionary that can be filled out by
update_expt_status()
- print_WE2E_summary(expts_dict: dict, debug: bool = False)
Creates a summary of the specified experiment
- print_test_info(txtfile: str = 'WE2E_test_info.txt') None
Prints a pipe-delimited (
|) text file containing summaries of each test with a configuration file intest_configs/*- Parameters:
txtfile (str) – File name for test details file (default:
WE2E_test_info.txt)- Returns:
None
- update_expt_status(expt: dict, name: str, refresh: bool = False, delay: int = 5, debug: bool = False, submit: bool = True) dict
This function reads the dictionary for a given experiment, runs the
rocotoruncommand to update the experiment (by running new jobs and updating the status of previously submitted ones), and reads the Rocoto database (.db) file to update the status of each job in the experiment dictionary. The function then uses a simple set of rules to combine the statuses of every task into a useful summary status for the whole experiment and returns the updated experiment dictionary.Experiment status levels explained:
CREATED: The experiments have been created, but the monitor script has not yet processed them. This is immediately overwritten at the beginning of the
monitor_jobs()function.SUBMITTING: All jobs are in status SUBMITTING or SUCCEEDED. This is a normal state; experiment monitoring will continue.
DYING: One or more tasks have died (status DEAD), so this experiment has an error. Experiment monitoring will continue until all previously submitted tasks are in either status DEAD or status SUCCEEDED (see next entry).
DEAD: One or more tasks are in status DEAD, and other previously submitted jobs are either DEAD or SUCCEEDED. This experiment will no longer be monitored.
ERROR: Could not read the Rocoto database (
.db) file. This will require manual intervention to solve, so the experiment will no longer be monitored.STALLED: All submitted jobs are SUCCEEDED but one or more jobs have not been submitted; if this state persists, it will become “STUCK”.
STUCK: All submitted jobs are SUCCEEDED but one or more jobs have not been submitted for multiple iterations; this can indicate system-level throttling or a problem with Rocoto dependencies.
RUNNING: One or more jobs are in status RUNNING, and other previously submitted jobs are in status QUEUED, SUBMITTED, or SUCCEEDED. This is a normal state; experiment monitoring will continue.
QUEUED: One or more jobs are in status QUEUED, and some others may be in status SUBMITTED or SUCCEEDED. This is a normal state; experiment monitoring will continue.
SUCCEEDED: All jobs are in status SUCCEEDED; experiment monitoring will continue for one more cycle in case there are unsubmitted jobs remaining.
COMPLETE: All jobs are in status SUCCEEDED, and the experiment has been monitored for an additional cycle to ensure that there are no unsubmitted jobs. This experiment will no longer be monitored.
- Parameters:
expt (dict) – A dictionary containing the information for an individual experiment, as described in the main
monitor_jobs()function.name (str) – Name of the experiment; used for logging only
refresh (bool) – If True, this flag will check an experiment status even if it is listed as DEAD, ERROR, or COMPLETE. Used for initial checks for experiments that may have been restarted.
delay (int) – [optional] Delay in seconds between calls to rocotorun.
debug (bool) – Will capture all output from
rocotorun. This will allow information such as job cards and job submit messages to appear in the log files, but turning on this option can drastically slow down the testing process.submit (bool) – In addition to reading the Rocoto database (
.db) file, the script will advance the workflow by callingrocotorun. If simply generating a report, set this to False.
- Returns:
expt – The updated experiment dictionary
- update_expt_status_parallel(expts_dict: dict, procs: int, refresh: bool = False, delay: int = 5, debug: bool = False) dict
This function updates an entire set of experiments in parallel, drastically speeding up the testing if given enough parallel processes. Given a dictionary of experiments, it will pass each individual experiment dictionary to
update_expt_status(), making use of the Python multiprocessingstarmap()functionality to update the experiments in parallel.- Parameters:
expts_dict (dict) – A dictionary containing information for all experiments
procs (int) – The number of parallel processes
refresh (bool) – “Refresh” flag to pass to
update_expt_status(). If True, this flag will check an experiment status even if it is listed as DEAD, ERROR, or COMPLETE. Used for initial checks for experiments that may have been restarted.delay (int) – [optional] Delay in seconds between calls to rocotorun.
debug (bool) – Will capture all output from
rocotorun. This will allow information such as job cards and job submit messages to appear in the log files, but can drastically slow down the testing process.
- Returns:
expts_dict – The updated dictionary of experiment dictionaries
- write_monitor_file(monitor_file: str, expts_dict: dict)
Writes status of tests to file
- Parameters:
- Returns:
None
- Raises:
KeyboardInterrupt – If a user attempts to disrupt program execution (e.g., with
Ctrl+C) while program is writing information tomonitor_file.