get_obs module

get_obs(config, obtype, yyyymmdd_task)

This script checks for the existence of obs files of the specified type at the locations specified by variables in the SRW App’s configuration file. If one or more of these files do not exist, it retrieves them from a data store (using the retrieve_data.py script and as specified by the configuration file parm/data_locations.yml for that script) and places them in the locations specified by the App’s configuration variables, renaming them if necessary.

Parameters:

config (dict) – The final configuration dictionary (obtained from var_defns.yaml).
obtype (str) – The observation type.
yyyymmdd_task (datetime.datetime) – The date for which obs may be needed.

Returns:

True (bool) – If all goes well.

Detailed Description:

In this script, the main (outer) loop to obtain obs files is over a sequence of archive hours, where each archive hour in the sequence represents one archive (tar) file in the data store, and archive hours are with respect to hour 0 of the day. The number of archive hours in this sequence depends on how the obs files are arranged into archives for the given obs type. For example, if the obs files for a given day are arranged into four archives, then the archive interval is 6 hours, and in order to get all the obs files for that day, the loop must iterate over a sequence of 4 hours, either [0, 6, 12, 18] or [6, 12, 18, 24] (which of these it will be depends on how the obs files are arranged into the archives).

Below, we give a description of archive layout for each obs type and give the archive hours to loop over for the case in which we need to obtain all available obs for the current day.

CCPA (Climatology-Calibrated Precipitation Analysis) precipitation accumulation obs

For CCPA, the archive interval is 6 hours, i.e. the obs files are bundled into 6-hourly archives. The archives are organized such that each one contains 6 files, so that the obs availability interval is

\[\begin{split}\begin{align*} \qquad \text{obs_avail_intvl_hrs} & = (\text{24 hrs})/[(\text{4 archives}) \times (\text{6 files/archive})] \hspace{50in} \\ & = \text{1 hr/file} \end{align*}\end{split}\]

i.e. there is one obs file for each hour of the day containing the accumulation over that one hour. The archive corresponding to hour 0 of the current day contains 6 files representing accumulations during the 6 hours of the previous day. The archive corresponding to hour 6 of the current day contains 6 files for the accumulations during the first 6 hours of the current day, and the archives corresponding to hours 12 and 18 of the current day each contain 6 files for accumulations during hours 6-12 and 12-18, respectively, of the current day. Thus, to obtain all the one-hour accumulations for the current day, we must extract all the obs files from the three archives corresponding to hours 6, 12, and 18 of the current day and from the archive corresponding to hour 0 of the next day. This corresponds to an archive hour sequence of [6, 12, 18, 24]. Thus, in the simplest case in which the observation retrieval times include all hours of the current task’s day at which obs files are available and none of the obs files for this day already exist on disk, this sequence will be [6, 12, 18, 24]. In other cases, the sequence we loop over will be a subset of [6, 12, 18, 24].

Note that CCPA files for 1-hour accumulation have incorrect metadata in the files under the “00” directory (i.e. for hours-of-day 19 to 00 of the next day) from 20180718 to 20210504. This script corrects these errors if getting CCPA obs at these times.

NOHRSC (National Operational Hydrologic Remote Sensing Center) snow accumulation observations

For NOHRSC, the archive interval is 24 hours, i.e. the obs files are bundled into 24-hourly archives. The archives are organized such that each one contains 4 files, so that the obs availability interval is

\[\begin{split}\begin{align*} \qquad \text{obs_avail_intvl_hrs} & = (\text{24 hrs})/[(\text{1 archive}) \times (\text{4 files/archive})] \hspace{50in} \\ & = \text{6 hr/file} \end{align*}\end{split}\]

i.e. there is one obs file for each 6-hour interval of the day containing the accumulation over those 6 hours. The 4 obs files within each archive correspond to hours 0, 6, 12, and 18 of the current day. The obs file for hour 0 contains accumulations during the last 6 hours of the previous day, while those for hours 6, 12, and 18 contain accumulations for the first, second, and third 6-hour chunks of the current day. Thus, to obtain all the 6-hour accumulations for the current day, we must extract from the archive for the current day the obs files for hours 6, 12, and 18 and from the archive for the next day the obs file for hour 0. This corresponds to an archive hour sequence of [0, 24]. Thus, in the simplest case in which the observation retrieval times include all hours of the current task’s day at which obs files are available and none of the obs files for this day already exist on disk, this sequence will be [0, 24]. In other cases, the sequence we loop over will be a subset of [0, 24].

MRMS (Multi-Radar Multi-Sensor) radar observations

For MRMS, the archive interval is 24 hours, i.e. the obs files are bundled into 24-hourly archives. The archives are organized such that each contains gzipped grib2 files for that day that are usually only a few minutes apart. However, since the forecasts cannot (yet) perform sub-hourly output, we filter this data in time by using only those obs files that are closest to each hour of the day for which obs are needed. This effectively sets the obs availability interval for MRMS to one hour, i.e.

\[\begin{split}\begin{align*} \qquad \text{obs_avail_intvl_hrs} & = \text{1 hr/file} \hspace{50in} \\ \end{align*}\end{split}\]

i.e. there is one obs file for each hour of the day containing values at that hour (but only after filtering in time; also see notes for MRMS_OBS_AVAIL_INTVL_HRS in config_defaults.yaml). Thus, to obtain the obs at all hours of the day, we only need to extract files from one archive. Thus, in the simplest case in which the observation retrieval times include all hours of the current task’s day at which obs files are available and none of the obs files for this day already exist on disk, the sequence of archive hours over which we loop will be just [0]. Note that:

For cases in which MRMS data are not needed for all hours of the day, we still need to retrieve and extract from this single daily archive. Thus, the archive hour sequence over which we loop over will always be just [0] for MRMS obs.
Because MRMS obs are split into two sets of archives – one for composite reflectivity (REFC) and another for echo top (RETOP) – on any given day (and with an archive hour of 0) we actually retrive and extract two different archive files (one per field).

NDAS (NAM Data Assimilation System) conventional observations

For NDAS, the archive interval is 6 hours, i.e. the obs files are bundled into 6-hourly archives. The archives are organized such that each one contains 7 files (not say 6). The archive associated with time yyyymmddhh_arcv contains the hourly files at

yyyymmddhh_arcv - 6 hours

yyyymmddhh_arcv - 5 hours

…

yyyymmddhh_arcv - 2 hours

yyyymmddhh_arcv - 1 hours

yyyymmddhh_arcv - 0 hours

These are known as the tm06, tm05, …, tm02, tm01, and tm00 files, respectively. Thus, the tm06 file from the current archive, say the one associated with time yyyymmddhh_arcv, has the same valid time as the tm00 file from the previous archive, i.e. the one associated with time (yyyymmddhh_arcv - 6 hours). It turns out that the tm06 file from the current archive contains more/better observations than the tm00 file from the previous archive. Thus, for a given archive time yyyymmddhh_arcv, we use 6 of the 7 files at tm06, …, tm01 but not the one at tm00, effectively resulting in 6 files per archive for NDAS obs. The obs availability interval is then

\[\begin{split}\begin{align*} \qquad \text{obs_avail_intvl_hrs} & = (\text{24 hrs})/[(\text{4 archives}) \times (\text{6 files/archive})] \hspace{50in} \\ & = \text{1 hr/file} \end{align*}\end{split}\]

i.e. there is one obs file for each hour of the day containing values at that hour. The archive corresponding to hour 0 of the current day contains 6 files valid at hours 18 through 23 of the previous day. The archive corresponding to hour 6 of the current day contains 6 files valid at hours 0 through 5 of the current day, and the archives corresponding to hours 12 and 18 of the current day each contain 6 files valid at hours 6 through 11 and 12 through 17 of the current day. Thus, to obtain all the hourly values for the current day (from hour 0 to hour 23), we must extract the 6 obs files (excluding the tm00 ones) from the three archives corresponding to hours 6, 12, and 18 of the current day and the archive corresponding to hour 0 of the next day. This corresponds to an archive hour sequence set below of [6, 12, 18, 24]. Thus, in the simplest case in which the observation retrieval times include all hours of the current task’s day at which obs files are available and none of the obs files for this day already exist on disk, this sequence will be [6, 12, 18, 24]. In other cases, the sequence we loop over will be a subset of [6, 12, 18, 24].

AERONET Aerosol Optical Depth (AOD) observations:

For AERONET, the archive interval is 24 hours. There is one archive per day containing a single text file that contains all of the day’s observations.

AIRNOW Air Quality Particulate Matter (PM25, PM10) observations:

For AIRNOW, the HPSS archive interval is 24 hours. There is one archive per day containing one text file per hour that contains all the observation for that hour. When retrieved from AWS, the interval is 1 hour.

get_obs_arcv_hr(obtype, arcv_intvl_hrs, hod)

This file defines a function that, for the given observation type, obs archive interval, and hour of day, returns the hour (counting from hour zero of the day) corresponding to the archive file in which the obs file for the given hour of day is included.

Note that for cumulative fields (like CCPA and NOHRSC, as opposed to instantaneous ones like MRMS and NDAS), the archive files corresponding to hour 0 of the day represent accumulations over the previous day. Thus, here, we do not return an archive hour of 0 for cumulative fields. Instead, if the specified hour-of-day is 0, we consider that to represent the 0th hour of the NEXT day (i.e. the 24th hour of the current day) and set the archive hour to 24.

Parameters:

obtype (str) – The observation type.
arcv_intvl_hrs (int) – Time interval (in hours) between archive files. For example, if the obs files are bundled into 6-hourly archives, then this will be set to 6. This must be between 1 and 24 and must divide evenly into 24.
hod (int) – The hour of the day. This must be between 0 and 23. For cumulative fields (CCPA and NOHRSC), hour 0 is treated as that of the next day, i.e. as the 24th hour of the current day.

Returns:

arcv_hr (int) – The hour since the start of day corresponding to the archive file containing the obs file for the given hour of day.

parse_args(argv): Parse command line arguments.