THREDDS Data Server

Introduction

The THREDDS Data Server web app supports OPeNDAP services for resources containing NetCDF files via Hydroshare's THREDDS (Thematic Real-time Environmental Distributed Data Services) data server (TDS). The OPeNDAP services in HydroShare are available only for Composite resource types shared in "Public" or “Published” status.

What may I do with this app?  

The THREDDS Data Server app enables users to subset, visualize, or analyze the NetCDF files stored in HydroShare using OPeNDAP client software without downloading the entire file from HydroShare to local computers.

Where may I get help?

For help with OPeNDAP clients, NetCDF, and DAP2 or other OPeNDAP protocols, consult or subscribe to the THREDDS email list. For detailed information see the NetCDF Tutorial. To report a problem with the THREDDS Data Server app or with thredds.hydroshare.org, email help@cuahsi.org.

Example uses of the app

The THREDDS app can be opened from the “Web Apps” tab or “Open with” dropdown on the resource landing page of the associated HydroShare resources. Below are three example use cases of how to use the THREDDS app and OPeNDAP client software for data subset, visualization and analysis. 

Use case 1: Creating a new subset NetCDF file with the DAP2 protocol

A common use case is when a HydroShare user wants to get a subset of the original dataset as a new NetCDF file. NetCDF files may contain a lot of data. Rather than transfer entire NetCDF datasets over the Internet, a Hydroshare user may more quickly obtain a subset of variables from the dataset constrained by (usually geo-temporal) dimensions. Here is an example workflow for the this user case:

    • In a browser, navigate to the landing page of a Hydroshare resource containing one or more NetCDF files. The resource should be a Composite resource type shared as “Public” or “Published”. Here is an example resource.
    • From the resource landing page, click on the "Open With" button. A menu appears with the apps authorized to access the resource. Select the THREDDS app. This will open a new browser tab displaying the Hydroshare THREDDS server (https://thredds.hydroshare.org) navigated to the contents of the resource.
    • Navigate through the resource contents to the NetCDF file of interest and click on the link for the NetCDF file. This will display the list of available THREDDS services for the file.
    • Click on the "OPeNDAP" service which allows access to the NetCDF dataset via various DAP2 protocol responses. An OPeNDAP dataset access form is displayed (see Figure 1). This form constructs subset URLS for use with OPeNDAP client software. This example uses the popular xarray Python package based on the Python NetCDF wrapper for the NetCDF C library.OpenDAP Dataset Access Form. Form contains the following text entry fields: Action, Data URL, Global Attributes, variables,

Figure 1: OPeNDAP dataset access form

  • On the form, select check-boxes for the variables to be subsetted. Be sure to include all dimensional variables required by dependent variables. In this example, time, x, and y are dimensional variables required by the SWE (snow water equivalent) dependent variable. Dimensions are distinguished on the form by the inclusion of the index extents.
  • On the form, constrain the selected variables as desired. Constraints for dimensions are automatically filled to cover all the index values for the dimension.
  • Dimensioned variables must be constrained by dimensional extents. Dimensional constraints are specified with slice notation. Indices are zero-based and inclusive.
    • [start-index:stride:last-index]
    • [start-index:last-index] (stride assumed to be 1)
    • [index] (for a single value of the dimension) 
    • In this example, the SWE variable is constrained to include all x and y values but only for the first time value.
  • Dimensionless variables may be constrained by value. Some dimensionless variables only exist for their variable attributes. In this example, the projection information for the grid mapping of the SWE (snow water equivalent) variable is contained in the attributes of the transverse_mercator variable. The only value for the transverse_mercator variable is an empty string.
  • As variables are selected and constrained, the form adds parameters to the "Data URL" field at the top of the form. When all selections and constraints are made, copy the URL for pasting into OPeNDAP client software.
  • Use OPeNDAP client software with the copied URL to remotely read the selected and constrained data, and then save the data to a local NetCDF file (see Figure 2). When the data are read via DAP2, the global attributes of the dataset are included. The global attributes, such as the history global attribute which provides the provenance of the dataset, may be edited to reflect the subset constraints before writing the new NetCDF dataset file if desired.

$ python
>>> import xarray as x
>>> ds = x.open_dataset("http://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/f3f947be65ca4b258e88b600141b85f3/data/contents/SWE_time.nc?time[0:1:2183],y[0:1:58],x[0:1:38],transverse_mercator,SWE[0:1:0][0:1:58][0:1:38]")
>>> ds.to_netcdf("swe-t0.nc")
>>> exit()
$

Figure 2: Writing a subsetted NetCDF dataset file with xarray

Use case 2: Resource visualization with Panoply

Panoply is an OPeNDAP client software for data visualization. Users may download and install Panoply on a local computer. Panoply requires Java JDK 9 or later on the local computer. This example remotely accesses the NetCDF contents of a Hydroshare resource and visualizes the data locally with Panoply:

  • In a browser, navigate to the landing page of a Hydroshare resource containing one or more NetCDF files. The resource should be a Composite resource type shared as “Public” or “Published.” Here is an example resource.
  • From the resource landing page, click on the "Open With" button. A menu appears with the apps authorized to access the resource. Select the THREDDS app. This will open a new browser tab displaying the Hydroshare THREDDS server (https://thredds.hydroshare.org) navigated to the contents of the resource.
  • Navigate through the resource contents to the NetCDF file of interest and click on the link for the NetCDF file. This will display the list of available THREDDS services for the file.
  • Click on the "OPeNDAP" service which allows access to the NetCDF dataset via various DAP2 protocol responses. An OPeNDAP dataset access form is displayed. Without subsetting or constraining any variables, select and copy the URL in the "Data URL" field at the top of the form to the local computer's clipboard (see Figure 1).
  • Open the Panoply software. Select File -> Open Remote Dataset from the Panoply menu (see Figure 3).

Panoply file open interface.

Figure 3: Open remote dataset in Panoply

  • Paste the Data URL from the THREDDS OPeNDAP dataset access form into the "Open Remote" Panoply dialog and click "Load" (see Figure 4).

Panoply "Open Remote" dialogue. Text says "Enter the URL of a remote dataset to open:", and below an entry field and a cancel and load button.

Figure 4: Paste the OPeNDAP Data URL in Panoply

  • All the variables of the dataset are displayed on the left panel of Panoply. Select a variable for visualization (e.g., SWE in this example dataset, see Figure 5).

    Panoply sources dialogue, with the datasets tab selected. Tabs include datasets, catalogs, and bookmarks. to the right is a code block showing the contents of a variable called SWE.

Figure 5: Select a variable in Panoply

  • Click “Create Plot.” Select a plot type (e.g., Color contour with two custom axes). Select the dimensions of the variable for the axes (e.g., x for the X axis and y for the Y axis).

Create plot dialogue open in panoply. Header reads "More than one type of plot can be created from the variable 'SWE'. What type would you like to create?". The options include Georeferenced longitude-latitude color contour plot, georeferenced zonal average line plot, color contour plot using x for the x axis and y for the y axis, and line plot using time for the horizontal axis. There is a cancel button and a create button.
Figure 6: Select a plot type in Panoply

  • Click "Create."  The variable plots with a scale for the first index of the first dimension not used for an axis (e.g., time, see Figure 7). The scale bar is automatically adjusted for the range of the plotted variable.

Color contour plot titled "Snow water equivalent" The Axis of the plot are labeled "X coordinate projection (m)" and "Y coordinate projection (m)." The legend below is labeled "Snow water equivalent (m)" and is a color bar that goes transitions from blue to red, left to right. Below the chart are four tabs: Arrays, scale, grid, contours, vectors, and labels. Arrays is selected. On this tab is the word plot next to a drop down menu, on which "Array 1 only" is shown. beside this, a box labeled "interpolate" is checked.

Figure 7: Create a contour plot in Panoply

  • The SWE values for the first time index in this example are so small that the default precision of the scale bar's tick values appears meaningless. This may be adjusted on the "Scale" tab of the plot window. Select an appropriate tick format (see Figure 8). In this example, the SWE values range from zero to less than 10-5. Therefore a precision of six decimal places is appropriate (select %6f in the tick format dropdown menu).

the same plot as above, at the bottom the "scale" tab is selected. Tick format, referenced above, is set to %6f, divisions major: 5, minor: 2 , size 11.

Figure 8: Adjust the tick value precision in Panoply

  • Enter the desired indices of the dimensions not used for axes (e.g., time) in the "Array" tab of the plot window. The value of the dimension for the index will automatically update, as will the plot. In this example, the index of 1400 corresponds to a time value of 2009-03-24 21:00 +00:00. Entering an index is much easier than selecting one of the 2184 time values from the dropdown in this example. When the plot is updated, the scale bar remains the same for comparison with the previously selected time value, which may not be very visually informative for the new time value (see Figure 9).



Figure 9: Select remaining dimension values for a plot in Panoply

  • Select the "Scale" tab of the plot window. Click on the "Fit to Data" button of the "Scale Range" options in order to see a more reasonably scaled plot of the variable at the new time value (see Figure 10).



Figure 10: Rescale a plot in Panoply

Use case 3: Resource data analysis with NCO

NCO is an OPeNDAP client software for data processing and analysis. Users may download and install NCO on a local PC. The Anaconda package manager is a recommended method for installing the NCO package. NCO itself is a collection of 14 command line executables for performing a variety of operations on NetCDF datasets. The input to each NCO executable may be either a remote OPeNDAP access or a local NetCDF file. The output of each NCO executable is a local NetCDF file. This example remotely accesses the NetCDF contents of a Hydroshare resource and analyzes the data locally with NCO:

  • In a browser, navigate to the landing page of a Hydroshare resource containing one or more NetCDF files. The resource should be a Composite resource type shared as “Public” or “Published.” Here is an example resource.
  • From the resource landing page, click on the "Open With" button. A menu appears with the apps authorized to access the resource. Select the THREDDS app. This will open a new browser tab displaying the Hydroshare THREDDS server (https://thredds.hydroshare.org) navigated to the contents of the resource.
  • Navigate through the resource contents to the NetCDF file of interest and click on the link for the NetCDF file. This will display the list of available THREDDS services for the file.
  • Click on the "OPeNDAP" service which allows access to the NetCDF dataset via various DAP2 protocol responses. An OPeNDAP dataset access form is displayed. Without subsetting or constraining any variables, select and copy the URL in the "Data URL" field at the top of the form to the local computer's clipboard (see Figure 1).
  • The ncwa NCO command can perform many statistical operations on a NetCDF dataset. The NCO command in Figure 11, when executed at a command prompt on the local computer, will remotely extract the time slice with the maximum SWE value from the example dataset and create a new local NetCDF file, max.nc. Use the clipboard to paste the remote Data URL for the ncwa input argument. The max.nc file created by ncwa may be visualized with Panoply (see Figure 12). The max.nc file also identifies the maximum SWE value of 1.1048695 meters occurred on February 14, 2009 at 10:30am UTC. The ncwa command also prepended the history global attribute in max.nc with a record of the operation.

$ ncwa -O -y max -a time 
https://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/f3f947be65ca4b258e88b600141b85f3/data/contents/SWE_time.nc max.nc

$

Figure 11: NCO operation to compute maximum SWE


Figure 12: Results of NCO operation to compute maximum SWE

  • The ncks NCO command (NetCDF "kitchen sink") can perform many utility functions on a NetCDF dataset, including subset functions. The NCO commands in Figure 13, when executed at a command prompt on the local computer, will create new local NetCDF files, each specifying only one time slice. Use the clipboard to paste the remote Data URL for the ncwa input argument.

$ ncks -d time,1456 
https://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/f3f947be65ca4b258e88b600141b85f3/data/contents/SWE_time.nc april_1.nc
$ ncks -d time,1575 
https://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/f3f947be65ca4b258e88b600141b85f3/data/contents/SWE_time.nc april_15.nc

$

Figure 13: NCO operations to subset single time slices
 

  • Some NCO commands operate on more than one NetCDF dataset as input to produce a single NetCDF dataset as output. The ncbo NCO command computes binary operations (addition, subtraction, multiplication, division) between the dependent variables of two NetCDF datasets to produce a new NetCDF dataset. The NCO command in Figure 14, when executed at a command prompt on the local computer, will create a new local NetCDF file which expresses the difference in the SWE variable between the two time slices produced in the previous step of this example. The diff.nc file created by ncbo may be visualized with Panoply (see Figure 15).
$ ncbo -O -y diff april_1.nc april_15.nc diff.nc 

Figure 14: NCO operation to subtract two time slices


Figure 15: Results of NCO operation to compute delta SWE

  • Share the analysts code and NetCDF result files as a new resource in HydroShare for others to validate the work and support research reproducibility.

Background

THREDDS Data Server (TDS)

The Thematic Real-time Environmental Distributed Data Services (THREDDS) Data Server (TDS) is open-source software distributed by the Unidata community program of the University Corporation for Atmospheric Research (UCAR). TDS provides web services which publish remote access to data and metadata stored in a variety of well-known geo-temporal dataset formats used for environmental research such as GRIB (Gridded Binary), HDF5 (Hierarchical Data Format version 5), and most commonly NetCDF (Network Common Data Form) as viewed through a Common Data Model (CDM). TDS presents gridded, point, and time series datasets organized into thematic catalogs. TDS provides access to data through suites of web services such as OPeNDAP and OGC.

OPeNDAP

The Open-source Project for Network Data Access Protocols (OPeNDAP) specifies two suites of web service requests and responses, DAP2 and DAP4, for remotely accessing CDM datasets. Remote access of TDS-hosted datasets through DAP2 client software allows the advantage of placing dimensional constraints on the CDM variable arrays transported in a response. The DAP2 request contains the constraints and effectively creates a subset of the TDS-hosted dataset for transport. The requesting client therefore need not concern itself with the size of the TDS-hosted dataset, but only with the size of the data response.

DAP2

As an additional advantage, DAP2 clients, such as Unidata’s NetCDF libraries or higher-level software utilizing those libraries such as xarray, initially make requests for only the metadata contained in the CDM header via a DAP2 Data Descriptor Structure (DDS) request and do not further request data until the variable array data is instantiated in the client via a DAP2 Distributed Oceanographic Data Systems (DODS) request. This “lazy-loading” behavior allows remotely opening the entire dataset but only transports portions of the dataset as needed programmatically thereby reducing network load, transmission time, and memory consumption.

CDM, NetCDF, and NCML

In contrast, due to dimensional constraints in DAP2 requests and DAP2 client lazy-loading, tools which assemble and manage Network Common Data Form (NetCDF) datasets have more influence on the size of TDS-hosted datasets. When only subsets of the datasets require transport, practicalities concerning dataset construction and server-side dataset management become the principles determining how to best partition large collections of data. Often the tools and computer stations which remotely access CDM datasets through TDS are the same or similar tools and stations constructing and managing the datasets prior to hosting. When collection-wide views of data are desirable, TDS supports NetCDF Markup Language (NCML) facilities to aggregate many datasets into one virtual dataset.

NetCDF and CF Conventions

CDM is a general model for dataset, dimensional, and variable attribute instantiation. Many TDS services depend on the recognition of dataset feature types (i.e., point, trajectory, station, profile, radial, grid, swath, etc.) for optimal operation. NetCDF which configures metadata in compliance with Climate Forecast (CF) conventions also enables many both TDS and DAP2 clients to intelligently recognize feature types beyond what the general model would otherwise convey, particularly for geo-referencing and projection.

References

TDS: https://docs.unidata.ucar.edu/tds/5.0/userguide/index.html

Common Data Model (CDM): https://www.unidata.ucar.edu/software/netcdf-java/v4.6/CDM/index.html

OPeNDAP requests and responses: https://www.unidata.ucar.edu/software/tds/current/tutorial/DAP.html 

DAP2 Specification: https://earthdata.nasa.gov/esdis/eso/standards-and-references/data-access-protocol-2 

NetCDF format specifications: https://www.unidata.ucar.edu/software/netcdf/documentation/historic/netcdf/File-Format.html 

Climate Forecast (CF) Metadata Conventions: https://cfconventions.org/ 

Domenico, Ben & Caron, John & Davis, Ethan & Kambic, Robb & Nativi, Stefano. (2002). Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating interactive analysis tools into NSDL. Journal of Digital Information; Vol 2, No 4 (2002). 2. 

Nativi, Stefano & Caron, John & Davis, Ethan & Domenico, Ben. (2005). Design and implementation of netCDF markup language (NcML) and its GML-based extension (NcML-GML). Computers & Geosciences. 31. 1104-1118. 10.1016/j.cageo.2004.12.006.

Credits, Authors, Contributors and Contacts

This app was created by the collaboration of USU and RENCI for the HydroShare team.