The CDBS database separates the data into separate files for each data collection station. This is done to make distribution of subsets of the database easier. Within the National Resources Conservation Service, the Water and Climate Center has the responsibility for being the central repository for Atmospheric and Water data, and the responsibility for periodically transferring localized subsets of that data to the various NRCS Field Offices. Since a Field Office may only require data from as few as 3 sites, and rarely needs data from more than a dozen sites, the separation of netCDF data into separate files will allow easier fulfillment of the mission of supporting the Field Offices.
CDBS 2.0 stores not only the data collected at stations, but also statistical information about the distribution of data values. Currently, the CDBS 2.0 netCDF Central Tendency files are designed to store yearly, monthly, and daily values for the statistical average, median, mode, standard deviation, standard error, skew, and kurtosis of the data collected at a station. These files also store the "normal" values for the station's data.
Within each netCDF data file, there are several variables that describe the station, several more variables that describe the data, and separate variables for each measure of the central tendency of each data element that is collected at that station. There are also attributes in each of the variables that define the data that is stored in that variable.
Data for which information flags are available will have separate variables for the flags, paralleling the layout of the variable containing the central tendency data. The design of the file includes one variable to store flags describing each data set, and other variables to store flags associated with each piece of data within a data set.
A more detailed description of the design of the netCDF data file follows, and the accompanying sample CDL shows an implementation of this design.
The name of the netCDF file for each station must be unique, and must fit within the limitations of the most restrictive operating system on which these files will be used -- in this case, DOS. Since copies of the data files will be transferred to any user wanting data from a particular station, the file name should indicate the station whose data is in the file. The following convention is used to define names for CDBS netCDF files:
The name of each NetCDF data file will consist of the following 5 concatenated fields (all characters will be lower case):
data network 2-character code for the data collection network identifying the source of the data. (See the CDBS 2.0 Data Collection Network Codes document for a list of possible values.) This is included to eliminate confusion between stations in different networks that use the same station identifier.
station id This is the station identifier used by the data collection network . Note that the station id may consist of mixed digits and characters.
period 1-character separator (".", ASCII 46), used to separate the file name from the file suffix.
state code 2-character code identifying the state in which the station is located. This is the same as the postal code for that state.
data file type The letter "c". This is to mark this file as a central tendency data file. (The complete CDBS database design also includes netCDF files for Observed, Forecast, Statistical, and Simulated data.)
Example: file "co5614.idc"
This file contains central tendency information describing the observed data for station number 5614, a National Weather Service Cooperative Data Network station in Idaho.
The dimensions listed below are present in every CDBS 2.0 netCDF Central Tendency file. The dimensions fall into two categories: dimensions used in variables that describe the station, and dimensions used in variables holding central tendency information about data collected at the station.
The following dimensions are used in variables that store station metadata:
sta_id_lgth=9 This is the maximum length of a station id, plus 1 character to hold the NULL that terminates the string.
hand_5_lgth=9 This is the maximum length of a Handbook 5 (SHEF) station id, plus 1 character to hold the NULL that terminates the string.
sta_nm_lgth=61 This is the maximum length of a station name, plus 1 character to hold the NULL that terminates the string.
st_cd_lgth=3 This is the maximum length of a FIPS alphabetic state code (postal code), plus 1 character to hold the NULL that terminates the string.
data_net_lgth=5 This is the maximum length of a data network code, plus 1 character to hold the NULL that terminates the string.
The following dimensions are used in variables that store either central tendency information or that store flags describing the central tendency information.
tend_set This is the unlimited dimension, and shows how many sets of years central tendencies have been calculated for at this station.
yr=1 The number of yearly values to store for each year.
mo=12 The number of monthly values to store for each year.
day=366 The number of daily values to store for each year.
tend_set_fg=2 The number of information flags associated with each set of central tendencies.
tend_data_fg_type
The number of data flags that are associated with each piece of data. The type will be replaced by the 4- or 5-character code for one of the data flagging systems. See the CDBS 2.0 Data Flag System Codes document for a list of known flag systems..
For example, in the NWS Cooperative Data Network, each central tendency value has one flag associated with it. If CDBS 2.0 calls this the "coopc" flag system, then the name of this dimension would be "tend_data_fg_coopc".
The following list of global attributes is included in each CDBS netCDF central tendency data file.
row_with_normals
Every five years, a new set of 30-year averages is calculated for data at selected stations. These specific 30-year averages are know as "normals". This attribute stores the row number of the tendency set whose "average" values make up the current set of "normals" for a station.
Conventions "CDBS"
element_reference "Elements Used in CDBS 2.0"
duration_reference "CDBS 2.0 Duration Codes"
history This attribute will hold the history of modifications to the file. To quote the netCDF User's Guide, this "is a character array with a line for each invocation of a program and arguments that were used to derive the file."
Within the netCDF file are several variables that identify the station. These variables hold a minimal set of metadata for the station; a much more extensive set of metadata is stored in the Informix side of the CDBS database. (You may notice that there is duplication between some of these variables and the information encoded in the file name. This is both for redundancy in case the file name gets inadvertently scrambled, and to allow better access to this information from within programs.)
The following is a list of the file's variables that describe the station, and the dimensions and attributes of each of these variables:
Variable "station_id"
This variable contains the identifier assigned to the station by the data collection network through which the data in this data file is received. This is the same station id that is included in the name of the data file. (Note the absence of a "_FillValue" attribute. Since the station id is required in order to define a file, the station_id will never be empty.)
Type:
char
Dimensions:
sta_id_lgth
Attributes:
long_name "Data Network Station Identifier"
reference The document in which station ids are assigned. For example, "NCDC TD-9767(??)".
Variable "handbook_5_station_id"
This variable contains the identifier assigned to the station in NOAA Handbook 5.
Type:
char
Dimensions:
hand_5_lgth
Attributes:
long_name "Handbook 5 (SHEF) Station Identifier"
_FillValue FILL_CHAR (defined in netcdf.h)
reference "NOAA Handbook 5"
Variable "wmo_station_id"
This variable contains the identifier assigned to the station by the World Meteorological Organization.
Type:
long
Dimensions:
Attributes:
long_name "Numeric WMO Station Identifier"
_FillValue FILL_LONG (defined in netcdf.h)
reference "Volume A of WMO Publication 9"
Variable "station_name"
This variable contains the name given to the station by the station's owner.
Type:
char
Dimensions:
sta_nm_lgth
Attributes:
long_name "Station Name"
_FillValue FILL_CHAR (defined in netcdf.h)
reference A character string identifying the source of the name. Usually the name of the owning agency, and (if available) the name of the document containing that agency's master station list.
Variable "data_network"
This variable contains the 4-character code assigned to the data collection network in the CDBS 2.0 Data Collection Network Codes document. This is different than the 2-character network code used as part of the data file's name. A 4-character code is used here, while a 2-character code is used in the file name. The reason for the shorter code in the file name is that the file name only has room for a 2-character code, while the 4-character codes used here are commonly recognized by the climate community. (Note the absence of a "_FillValue" attribute. Since the data network code is required in order to define a file, the "data_network" variable will never be empty.)
Type:
char
Dimensions:
data_net_lgth
Attributes:
long_name "Data Collection Network Code"
reference "NRCS CDBS 2.0 Data Collection Network Codes"
Variable "state"
This variable contains the 2-character code assigned to the state by the Federal Information Publication System. This is the same as the state code used in the name of the data file. (Note the absence of a "_FillValue" attribute. Since the state code is required in order to define a file, the "state" variable will never be empty.)
Type:
char
Dimensions:
st_cd_lgth
Attributes:
long_name "FIPS Alphabetic State Code (Postal Code)"
reference "FIPS Manual"
Variable "file_type"
This variable contains the 1-character code assigned to this data file type by the CDBS Database Manager. For files containing Central Tendencies data, this character will always be "c". (Note the absence of a "_FillValue" attribute. Since the file type is required in order to define a file, the file_type will never be empty.)
Type:
char
Dimensions:
Attributes:
long_name "Data File Type"
reference "NRCS CDBS Database Documentation"
Variable "lat"
This variable contains the station latitude. Within CDBS, North latitudes are positive numbers and South latitudes are negative numbers. All latitudes are listed in degrees, so a latitude of 35 15' 0" N would be listed as +35.25 degrees.
Type:
double
Dimensions:
Attributes:
long_name "Station Latitude"
_FillValue FILL_DOUBLE (defined in netcdf.h)
valid_range -90.0, 90.0
units preferred units are "degrees_north", but any of the Unidata udunits synonyms for "degrees_north" are acceptable
Variable "lon"
This variable contains the station longitude. Within CDBS, East longitudes are positive numbers and West longitudes are negative numbers. All longitudes are listed in degrees, so a longitude of 120 30' 0" W would be listed as -120.5 degrees.
Type:
double
Dimensions:
Attributes:
long_name "Station Longitude"
_FillValue FILL_DOUBLE (defined in netcdf.h)
valid_range -180.0, 180.0
units preferred units are "degrees_east", but any of the Unidata udunits synonyms for "degrees_east" are acceptable
Variable "elev"
This variable contains the station elevation.
Type:
float
Dimensions:
Attributes:
long_name "Station Elevation"
_FillValue FILL_FLOAT (defined in netcdf.h)
units "feet"
Each CDBS netCDF data file will have several variables that describe the data. They list the beginning and ending years of the observed data from which each data set was calculated, the date that each data set was created, and they hold the information flags associated with both the data sets and with each value within each data set.
Variable "tend_data_strt"
This variable contains the dates of the start of the time periods for which sets of central tendencies are calculated. The dates will be created by the Unidata udunits utInvCalendar() library function (or created by any other function that produces the same value for the same date). Note that the "units" attribute includes the station's time offset from GMT.
Type:
double
Dimensions:
tend_set The number of central tendency sets in this file.
Attributes:
long_name "Start Date of Central Tendency Set"
units The character string "minutes since 1800-1-1 00:00 -time_zone_offset", where time_zone_offset is the difference between the station's reporting time and Greenwich Mean Time, listed as a 4-digit time. For example, a station located in the Eastern Time Zone would have a time_zone_offset of -05:00, and the "units" would be "days since 1800-1-1 00:00 -05:00".
Variable "tend_data_end"
This variable contains the dates of the end of the time periods for which sets of central tendencies are calculated. The dates will be created by the Unidata udunits utInvCalendar() library function (or created by any other function that produces the same value for the same date). Note that the "units" attribute includes the station's time offset from GMT.
Type:
double
Dimensions:
tend_set The number of central tendency sets in this file.
Attributes:
long_name "End Date of Central Tendency Set"
units The character string "minutes since 1800-1-1 00:00 -time_zone_offset", where time_zone_offset is the difference between the station's reporting time and Greenwich Mean Time, listed as a 4-digit time. For example, a station located in the Eastern Time Zone would have a time_zone_offset of -05:00, and the "units" would be "days since 1800-1-1 00:00 -05:00".
Variable "tend_data_prep"
This variable contains the dates that the sets of central tendencies were calculated. The times will be created by the Unidata udunits utInvCalendar() library function (or created by any other function that produces the same value for the same date). Note that the "units" attribute includes the station's time offset from GMT.
Type:
double
Dimensions:
tend_set The number of central tendency sets in this file.
Attributes:
long_name "Preparation Date of Central Tendency Set"
units The character string "minutes since 1800-1-1 00:00 -time_zone_offset", where time_zone_offset is the difference between the station's reporting time and Greenwich Mean Time, listed as a 4-digit time. For example, a station located in the Eastern Time Zone would have a time_zone_offset of -05:00, and the "units" would be "days since 1800-1-1 00:00 -05:00".
_FillValue FILL_DOUBLE (defined in netcdf.h)
Variable "tend_set_fg"
This variable contains information flags associated with each set of central tendencies (note that the variable below contains flags associated with each individual central tendency value).
Type:
char
Dimensions:
tend_set The number of central tendency sets in this file.
tend_set_fg The number of flags associated with each set of central tendencies.
Attributes:
long_name "Flags for Sets of Central Tendencies"
reference "CDBS 2.0 Central Tendency Flags"
_FillValue FILL_CHAR (defined in netcdf.h)
Variable "element[_depth_height_code]_duration_data_fg"
These variables contain information flags associated with each individual central tendency value (note that the variable above contains the flags associated with each set of central tendencies). The element in the variable name will be replaced by one of the CDBS 2.0 element codes. The depth_height_code will be replaced by one of the 1-character codes identifying either the sensor's depth below ground or the sensor's height above ground. The duration will be replaced by one of the duration codes used in CDBS 2.0.
Type:
char
Dimensions:
tend_set The number of central tendency sets in this file.
duration_dimension
Either "yr", "mo", or "day".
tend_data_fg_type The number of flags associates with each piece of data for this element. The type will be replaced by the 4- or 5-character CDBS code for a data flagging system. See the CDBS 2.0 Data Flag System Codes document for a list of known flag systems.
Attributes:
long_name A character string of the format "Flags associated with duration Central Tendency values for element-name [, measured at depth/height]", where duration will be replaced by the word "yearly", "monthly", or "daily", element_name will be replaced by an element description taken from the Elements Used in CDBS 2.0 document, and the optional depth/height will (if the data is measured at a specific height above ground or depth below ground) be replaced by one of the depth descriptions taken from the CDBS 2.0 Soil Sensor Depth Codes document or one of the height descriptions taken from the CDBS 2.0 Sensor Height Codes document. For example, variable prcp_d_data_fg will have a long_name of "Flags associated with daily Central Tendency values for precipitation - incremental", and variable stv3_m_data_fg will have a long name of "Flags associated with monthly Central Tendency values for soil moisture, average, measured at 8 inches".
flag_sys The 4- or 5-character code identifying the data flagging system whose flags are being stored in this variable. This is the same flag_sys code as the one incorporated into the name of the variable's third dimension, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name.
element The 5-character code for the type of data whose maximum(s) are stored in this variable. This is the same element code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name.
depth_height_code
(Optional attribute -- used only when the element has such a code) The 1-character code identifying the height above ground or the depth below ground at which the data is measured. See the documents CDBS 2.0 Sensor Depth Codes and CDBS 2.0 Sensor Height Codes for a list of the codes.
duration The 1-character code for the data duration. This is the same duration code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name. See the CDBS 2.0 Duration Codes document for a complete list of these codes.
reference A character string indicating the source of the definitions for meanings of the flags.
Example: 'NCDC TD-9641".
_FillValue FILL_CHAR (defined in netcdf.h)
The following coordinate variables will be found in every CDBS 2.0 netCDF Central Tendencies file:
Variable "yr"
This variable contains the day-of-year corresponding to the end of a year. This variable is present primarily as a parallel to variables "mo" and "day" below.
Type:
double
Dimensions:
yr
Attributes:
long_name "Annual"
units "minute"
Variable "mo"
This variable contains the days of the year corresponding to the end of each month of the year. The day-of-year will always be calculated using the assumption that February 29 occurred during the year. Note the absence of a "_FillValue" attribute. Since there is a day for the end of each month, this variable will not have any empty values.
Type:
double
Dimensions:
mo
Attributes:
long_name "Month of Year"
units "minute"
Variable "day"
This variable contains the day of the year. This coordinate variable will always have an entry for February 29. Note the absence of a "_FillValue" attribute. Since every day has a day number, this variable will not have any empty values.
Type:
double
Dimensions:
day
Attributes:
long_name "Day of Year"
units "minute"
The Central Tendency variables store statistical data describing the distribution of data of a particular element during a specific time period. The following types of statistical information are currently being stored by CDBS 2.0:
average (normal)
median
mode
standard deviation
standard error
skew
kurtosis
The time periods for which these values are currently being stored are:
year
month
day
Since many stations report data at times other than midnight, there is also a variable containing the adjustment values that need to be added to the average, median, and mode values in order to normalize them to a standard midnight reporting time.
Variable "element[_depth_height_code]_duration_tend_statistic"
These variables contain information describing the statistical distribution of an element's data values. The element will be replaced by one of the CDBS 2.0 4- or 5-character element codes. The depth_height_code will be replaced by one of the 1-character codes identifying either the sensor's depth below ground or the sensor's height above ground. The duration will be replaced by one of the duration codes used in CDBS 2.0. The statistic will be replaced by the abbreviation for one of the types of statistical information listed above The current list of abbreviations is:
avg = average (normal)
med = median
mod = mode
stddev = standard deviation
stderr = standard error
skew = skew
kurt = kurtosis
Type:
float
Dimensions:
tend_set
duration_dimension
Either "yr", "mo", or "day".
Attributes:
long_name A character string of the format "statistic of duration data values for element_name [, measured at depth/height]", where statistic will be replaced by one of the full names listed above, duration will be replaced by the word "yearly", "monthly", or "daily", element_name will be an element description taken from the Elements Used in CDBS 2.0 document, and the optional depth/height will (if the data is measured at a specific height above ground or depth below ground) be replaced by one of the depth descriptions taken from the CDBS 2.0 Soil Sensor Depth Codes document or one of the height descriptions taken from the CDBS 2.0 Sensor Height Codes document.
Examples: variable prcp_d_tend_stddev will have a long_name of "standard deviation of daily data values for precipitation-incremental", variable stx2_m_tend_avg will have a long_name of "average of monthly data values for soil temperature, maximum, measured at 4 inches".
units One of the unit names used by the Unidata udunits package. For example, element prcp is measured in units of "inch", and element rhum is measured in units of "percent".
element The 4- or 5-character code for the type of data stored in this variable. This is the same element code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name.
depth_height_code
(Optional attribute -- used only when the element has such a code) The 1-character code identifying the height above ground or the depth below ground at which the data is measured. See the documents CDBS 2.0 Sensor Depth Codes and CDBS 2.0 Sensor Height Codes for a list of the codes.
duration The 1-character code for the data duration. This is the same duration code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name. See the CDBS 2.0 Duration Codes document for a complete list of these codes.
statistic The abbreviation for the type of statistic being stored in this variable. This is the same abbreviation as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name.
decimal_places This is a number of type "short", holding the precision of the data as it is measured by the sensor, listed as the number of decimal places, and having default values taken from the Elements Used in CDBS 2.0 document. Examples: element prcp has a default decimal_places of 2 (meaning measurements are accurate to the nearest 0.01 units), element snwd has a default of 0 (meaning measurements are accurate to the nearest whole unit), etc.
_FillValue FILL_FLOAT (defined in netcdf.h).
missing_value (-FILL_FLOAT).
Variable "element[_depth_height_code]_duration_adj_avg"
Many stations report data at times other than midnight, and this variable contains the adjustment values that need to be added to the average, median, and mode values in order to normalize them to a standard midnight reporting time. The element will be replaced by one of the CDBS 2.0 4- or 5-character element codes. The depth_height_code will be replaced by one of the 1-character codes identifying either the sensor's depth below ground or the sensor's height above ground. The duration will be replaced by one of the duration codes used in CDBS 2.0.
Type:
float
Dimensions:
tend_set
duration_dimension
Either "yr", "mo", or "day".
Attributes:
long_name A character string of the format "adjustments to duration data values for element_name [, measured at depth/height], to give values for midnight reporting time", where duration will be replaced by the word "yearly", "monthly", or "daily", element_name will be an element description taken from the Elements Used in CDBS 2.0 document, and the optional depth/height will (if the data is measured at a specific height above ground or depth below ground) be replaced by one of the depth descriptions taken from the CDBS 2.0 Soil Sensor Depth Codes document or one of the height descriptions taken from the CDBS 2.0 Sensor Height Codes document.
Examples: variable prcp_d_adj_avg will have a long_name of "adjustments to daily data values for precipitation-incremental, to give values for midnight reporting time", variable stx4_m_adj_avg will have a long_name of "adjustments to monthly data values for soil temperature, maximum, measured at 20 inches, to give values for midnight reporting time".
units One of the unit names used by the Unidata udunits package. For example, element prcp is measured in units of "inch", and element rhum is measured in units of "percent".
element The 4- or 5-character code for the type of data stored in this variable. This is the same element code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name.
depth_height_code
(Optional attribute -- used only when the element has such a code) The 1-character code identifying the height above ground or the depth below ground at which the data is measured. See the documents CDBS 2.0 Sensor Depth Codes and CDBS 2.0 Sensor Height Codes for a list of the codes.
duration The 1-character code for the data duration. This is the same duration code as the one incorporated into the name of the variable, and is repeated here both for redundancy, and to eliminate the need for a program accessing this data to have to know how to decode the information encoded in a variable name. See the CDBS 2.0 Duration Codes document for a complete list of these codes.
decimal_places This is a number of type "short", holding the precision of the data as it is measured by the sensor, listed as the number of decimal places, and having default values taken from the Elements Used in CDBS 2.0 document. Examples: element prcp has a default decimal_places of 2 (meaning measurements are accurate to the nearest 0.01 units), element snwd has a default of 0 (meaning measurements are accurate to the nearest whole unit), etc.
_FillValue FILL_FLOAT (defined in netcdf.h)
missing_value (-FILL_FLOAT)