CDBS 2.0 NetCDF File Design

for Statistics About Atmospheric and Water Data

Table of Contents

INTRODUCTION

The CDBS database separates the observed data into separate files for each data collection station. It also puts the statistics concerning the observed data (maximums of record, minimums of record, and probability of exceedence) in a file separate from the file holding the observed data. This is done to make distribution of subsets of the database easier. Within the National Resources Conservation Service, the Water and Climate Center has the responsibility for being the central repository for Atmospheric and Water data, and the responsibility for periodically transferring localized subsets of that data to the various NRCS Field Offices. Since a Field Office may only require data from as few as 3 sites, and rarely needs data from more than a dozen sites, the separation of netCDF data into separate files will allow easier fulfillment of the mission of supporting the Field Offices.

Within each netCDF statistics file, there are several variables that describe the station, several more variables that describe the data, and separate variables for each type of statistical information that is stored for that station. There are also attributes in each of the variables that define the data that is stored in that variable.

Since frequency analyses of data can be based on different time spans (period of record, current normals period, etc.) the netCDF statistics file can hold multiple sets of statistics, and separates the sets according to the beginning and ending dates of their time spans.

Statistical data for which quality flags are available will have separate variables for the flags, paralleling the layout of the variable containing the statistical data. These variables will have an additional dimension based on the number of flags associated with each data value.

A more detailed description of the design of the netCDF Statistics file follows, and the accompanying sample CDL shows an implementation of this design.

GLOBAL INFORMATION

FILE NAMES

The name of the netCDF file for each station must be unique, and must fit within the limitations of the most restrictive operating system on which these files will be used -- in this case, DOS. Since copies of the data files will be transferred to any user wanting data from a particular station, the file name should indicate the station whose data is in the file. The following convention is used to define names for CDBS netCDF files:

The name of each NetCDF statistics file will consist of the following 5 concatenated fields (all characters will be lower case):
data network 2-character code for the data collection network identifying the source of the data. (See the CDBS 2.0 Data Collection Network Codes document for a list of possible values.) This is included to eliminate confusion between stations in different networks that use the same station identifier.
station id This is the station identifier used by the data collection network . Note that the station id may consist of mixed digits and characters.
period 1-character separator (".", ASCII 46), used to separate the file name from the file suffix.
state code 2-character code identifying the state in which the station is located. This is the same as the postal code for that state.
data file type The letter "s." This is to mark this file as a statistical data file. (The complete CDBS database design also includes netCDF files for Observed, Forecast, Central Tendencies, and Simulated data.)

Example:

DIMENSIONS

All of the dimensions listed below are present in every CDBS 2.0 netCDF Statistics file. The dimensions fall into three categories: dimensions used by variables containing information about the station, dimensions used by variables that contain the different time spans of data from which statistical analyses were done, and dimensions used by variables containing statistical information related to a station's data's probability-of-exceedence.

The following dimensions are used in variables that store station metadata:
sta_id_lgth=9 The maximum length of a station id, plus 1 character to hold the NULL that terminates the string.
hand_5_lgth=9 The maximum length of a Handbook 5 (SHEF) station id, plus 1 character to hold the NULL that terminates the string.
sta_nm_lgth=61 The maximum length of a station name, plus 1 character to hold the NULL that terminates the string.
st_cd_lgth=3 The maximum length of a FIPS alphabetic state code (postal code), plus 1 character to hold the NULL that terminates the string.
data_net_lgth=5 The maximum length of a data network code, plus 1 character to hold the NULL that terminates the string.
The following dimensions are used in variables that store the describe the sets of data that the statistical analyses were performed on:
data_set This is the unlimited dimension, and shows how many sets of source data have had statistical analyses performed on them.
data_set_fg=2 The number of information flags associated with each set of source data.
The following dimensions are used in variables that store information related to the station's data's probability-of-exceedence values:
pblty=17 The number of probability levels for which probability-of-exceedence values are stored for this station's data. See the CDBS 2.0 Probability Levels document for the complete list of probabilities.
storm=18 The number of storm depths (durations) for which probability-of-exceedence values are stored. See the CDBS 2.0 Storm Durations document for the complete list of storm durations.
drtn=20 The number of data durations for which CDBS 2.0 will store data. See the CDBS 2.0 Duration Codes document for the complete list of durations.
yr = 1 The number of annual values to be stored per year.
mo = 12 The number of monthly values to be stored per year.
day = 366 The number of daily values to be stored per year.
prd_of_rcd = 1
fg_type The number of data flags that are associated with each piece of data. The type will be replaced by the 4- or 5-character code for one of the data flagging systems. See the CDBS 2.0 Data Flag System Codes document for a list of known flag systems. There may be several of these "fg_type" dimensions for a single station.

GLOBAL ATTRIBUTES

The following list of global attributes is included in each CDBS netCDF Statistics data file.
row_for_normals_period Every 5 years, a new 30-year time span is defined for the purpose of calculating "normals" statistics. This attribute stores the row number of the statistical set whose values have been based on the current "normals" period.
Conventions "CDBS"
element_reference "Elements Used in CDBS 2.0"
duration_reference "CDBS 2.0 Duration Codes"
probability_reference "CDBS 2.0 Probability Levels"
storm_reference "CDBS 2.0 Storm Durations"
history This attribute will hold the history of modifications to the file. To quote the netCDF User's Guide, this "is a character array with a line for each invocation of a program and arguments that were used to derive the file."

INFORMATION ABOUT VARIABLES

VARIABLES DESCRIBING THE STATION

Within the netCDF file are several variables that identify the station. These variables hold a minimal set of metadata for the station; a much more extensive set of metadata is stored in the Informix side of the CDBS database. (You may notice that there is duplication between some of these variables and the information encoded in the file name. This is both for redundancy in case the file name gets inadvertently scrambled, and to allow better access to this information from within programs.)

The following is a list of the file's variables that describe the station, and the dimensions and attributes of each of these variables:

VARIABLES DESCRIBING THE DATA

Each CDBS netCDF statistics file will have several variables that describe the data sets stored in the file. They list the beginning and ending dates of the source data on which each set of statistics is based, the date that each statistical set was created, and the information flags associated with the sets of source data (as opposed to flags associated with each individual piece of statistical data, which are stored in separate variables, listed on following pages).

COORDINATE VARIABLES

The following coordinate variables are present in each CDBS 2.0 netCDF Statistics file:

VARIABLES RELATED TO MAXIMUMS AND MINIMUMS OF RECORD

These variables hold the maximums and minimums of record for the observed, derived, and interpreted data for a station. The maximums and minimums cover a variety of time spans, and are accompanied (in a separate variable) by the years in which the maximum or minimum was recorded. There is an optional third variable, to record any data flags that a data supplier may supply with the data.