WE2E package

Submodules

WE2E.WE2E_summary module

setup_logging(debug: bool = False) → None

Sets up logging to print high-priority (INFO and higher) messages to the console and to print all messages with detailed timing and routine info to the specified text file.

Parameters:: debug (bool) – Set to True to print more verbose output to the console
Returns:: None

WE2E.monitor_jobs module

monitor_jobs(expts_dict: dict, monitor_file: str = '', procs: int = 1, mode: str = 'continuous', delay: int = 5, debug: bool = False) → str

Monitors and runs jobs for the specified experiment using Rocoto

Parameters:

expts_dict (dict) – A dictionary containing the information needed to run one or more experiments. See example file monitor_jobs.yaml.
monitor_file (str) – [optional] Name of the file used to monitor experiment results. Default is monitor_jobs.yaml.
procs (int) – [optional] The number of parallel processes to run
mode (str) – [optional] Mode of job monitoring. Options: (1) 'continuous' (default): monitor jobs continuously until complete or (2) 'advance': increment jobs once, then quit.
delay (int) – [optional] Delay in seconds between calls to rocotorun. Very large experiments may result in system slowdowns if this value is too low.
debug (bool) – [optional] Enable extra output for debugging

Returns:

monitor_file – The name of the file used for job monitoring (when script is finished, this contains results/summary)

setup_logging(logfile: str = 'log.run_WE2E_tests', debug: bool = False) → None

Sets up logging, printing high-priority (INFO and higher) messages to screen, and printing all messages with detailed timing and routine info in the specified text file.

Parameters:

logfile (str) – Name of log file for WE2E tests (default: log.run_WE2E_tests)
debug (bool) – Set to True to enable extra output for debugging

Returns:

None

WE2E.print_test_info module

Script for parsing all test files in the test_configs directory and printing a pipe-delimited summary file of the details of each test.

WE2E.run_we2e_tests module

Run and monitor WE2E tests.

check_task_get_extrn_bcs(cfg: dict, mach: dict, dflt: dict, ics_or_lbcs: str = '') → dict

Checks and updates various settings in the task_get_extrn_ics or task_get_extrn_lbcs section of the test’s configuration YAML file

Parameters:

cfg (dict) – Contents loaded from test configuration file
mach (dict) – Contents loaded from machine settings file
dflt (dict) – Contents loaded from default configuration file (config_defaults.yaml)
ics_or_lbcs (str) – Perform checks for either the ICs task or the LBCs task. Valid values: "ics" | "lbcs"

Returns:

cfg_bcs – Updated dictionary for task_get_extrn_[ics|lbcs] section of test configuration file

check_test(test: str) → str

Checks that a string corresponds to a valid test name

Parameters:: test (str) – Potential test name
Returns:: config – Name of the test configuration file (empty string if no test file is found)

check_tests(tests: list) → list

Checks that all tests in a provided list of tests are valid

Parameters:: tests (list) – List of potentially valid test names
Returns:: tests_to_run – List of configuration files corresponding to test names

run_we2e_tests(homedir, args) → None

Runs the Workflow End-to-End (WE2E) tests selected by the user

Parameters:

homedir (str) – The full path to the top-level application directory
args (argparse.Namespace) – Command-line arguments

Returns:

None

setup_logging(logfile: str = 'log.run_WE2E_tests', debug: bool = False) → None

Sets up logging, prints high-priority (INFO and higher) messages to screen, and prints all messages with detailed timing and routine info to the specified text file.

Parameters:

logfile (str) – Name of the test logging file (default: log.run_WE2E_tests)
debug (bool) – Set to True for more detailed output/information

Returns:

None

WE2E.utils module

A collection of utilities used by the various WE2E scripts

calculate_core_hours(expts_dict: dict) → dict

Takes in an experiment dictionary, reads the var_defns.sh file for necessary information, and calculates the core hours used by each task, updating expts_dict with this information

Parameters:: expts_dict (dict) – The information needed to run one or more experiments. See example file WE2E_tests.yaml
Returns:: expts_dict – Experiment dictionary updated with core hours

compare_rocotostat(expt_dict, name)

Reads the dictionary showing the location of a given experiment, runs a rocotostat command to get the full set of tasks for the experiment, and compares the two to see if there are any unsubmitted tasks remaining.

Parameters:

expt_dict (dict) – A dictionary containing the information for an individual experiment
name (str) – Name of the experiment

Returns:

expt_dict – A dictionary containing the information for an individual experiment

create_expts_dict(expt_dir: str, delay: int)

Takes in a directory, searches that directory for subdirectories containing experiments, and creates a skeleton dictionary that can be filled out by update_expt_status()

Parameters:

expt_dir (str) – Experiment directory name
delay (int) – [optional] Delay in seconds between calls to rocotorun.

Returns:

(summary_file, expts_dict) – A tuple including the name of the summary file (WE2E_tests_YYYYMMDDHHmmSS.yaml) and the experiment dictionary

print_WE2E_summary(expts_dict: dict, debug: bool = False)

Creates a summary of the specified experiment

Parameters:

expts_dict (dict) – A dictionary containing the information needed to run one or more experiments. See example file WE2E_tests.yaml.
debug (bool) – [optional] Enable extra output for debugging

Returns:

None

print_test_info(txtfile: str = 'WE2E_test_info.txt') → None

Prints a pipe-delimited ( | ) text file containing summaries of each test with a configuration file in test_configs/*

Parameters:: txtfile (str) – File name for test details file (default: WE2E_test_info.txt)
Returns:: None

update_expt_status(expt: dict, name: str, refresh: bool = False, delay: int = 5, debug: bool = False, submit: bool = True) → dict

This function reads the dictionary for a given experiment, runs the rocotorun command to update the experiment (by running new jobs and updating the status of previously submitted ones), and reads the Rocoto database (.db) file to update the status of each job in the experiment dictionary. The function then uses a simple set of rules to combine the statuses of every task into a useful summary status for the whole experiment and returns the updated experiment dictionary.

Experiment status levels explained:

CREATED: The experiments have been created, but the monitor script has not yet processed them. This is immediately overwritten at the beginning of the monitor_jobs() function.

SUBMITTING: All jobs are in status SUBMITTING or SUCCEEDED. This is a normal state; experiment monitoring will continue.

DYING: One or more tasks have died (status DEAD), so this experiment has an error. Experiment monitoring will continue until all previously submitted tasks are in either status DEAD or status SUCCEEDED (see next entry).

DEAD: One or more tasks are in status DEAD, and other previously submitted jobs are either DEAD or SUCCEEDED. This experiment will no longer be monitored.

ERROR: Could not read the Rocoto database (.db) file. This will require manual intervention to solve, so the experiment will no longer be monitored.

STALLED: All submitted jobs are SUCCEEDED but one or more jobs have not been submitted; if this state persists, it will become “STUCK”.

STUCK: All submitted jobs are SUCCEEDED but one or more jobs have not been submitted for multiple iterations; this can indicate system-level throttling or a problem with Rocoto dependencies.

RUNNING: One or more jobs are in status RUNNING, and other previously submitted jobs are in status QUEUED, SUBMITTED, or SUCCEEDED. This is a normal state; experiment monitoring will continue.

QUEUED: One or more jobs are in status QUEUED, and some others may be in status SUBMITTED or SUCCEEDED. This is a normal state; experiment monitoring will continue.

SUCCEEDED: All jobs are in status SUCCEEDED; experiment monitoring will continue for one more cycle in case there are unsubmitted jobs remaining.

COMPLETE: All jobs are in status SUCCEEDED, and the experiment has been monitored for an additional cycle to ensure that there are no unsubmitted jobs. This experiment will no longer be monitored.

Parameters:

expt (dict) – A dictionary containing the information for an individual experiment, as described in the main monitor_jobs() function.
name (str) – Name of the experiment; used for logging only
refresh (bool) – If True, this flag will check an experiment status even if it is listed as DEAD, ERROR, or COMPLETE. Used for initial checks for experiments that may have been restarted.
delay (int) – [optional] Delay in seconds between calls to rocotorun.
debug (bool) – Will capture all output from rocotorun. This will allow information such as job cards and job submit messages to appear in the log files, but turning on this option can drastically slow down the testing process.
submit (bool) – In addition to reading the Rocoto database (.db) file, the script will advance the workflow by calling rocotorun. If simply generating a report, set this to False.

Returns:

expt – The updated experiment dictionary

update_expt_status_parallel(expts_dict: dict, procs: int, refresh: bool = False, delay: int = 5, debug: bool = False) → dict

This function updates an entire set of experiments in parallel, drastically speeding up the testing if given enough parallel processes. Given a dictionary of experiments, it will pass each individual experiment dictionary to update_expt_status(), making use of the Python multiprocessing starmap() functionality to update the experiments in parallel.

Parameters:

expts_dict (dict) – A dictionary containing information for all experiments
procs (int) – The number of parallel processes
refresh (bool) – “Refresh” flag to pass to update_expt_status(). If True, this flag will check an experiment status even if it is listed as DEAD, ERROR, or COMPLETE. Used for initial checks for experiments that may have been restarted.
delay (int) – [optional] Delay in seconds between calls to rocotorun.
debug (bool) – Will capture all output from rocotorun. This will allow information such as job cards and job submit messages to appear in the log files, but can drastically slow down the testing process.

Returns:

expts_dict – The updated dictionary of experiment dictionaries

write_monitor_file(monitor_file: str, expts_dict: dict)

Writes status of tests to file

Parameters:

monitor_file (str) – File name
expts_dict (dict) – Experiments being monitored

Returns:

None

Raises:

KeyboardInterrupt – If a user attempts to disrupt program execution (e.g., with Ctrl+C) while program is writing information to monitor_file.