Browse the
Archive of Past Articles

Link to NSSDC Archive


Visit
NSSDC's
Home Page
Link to NSSDC's home page.


Visit
SSDOO's
Home Page
Link to SSDOO's home page

 

Visit
GSFC's
Home Page
Link to GSFC's home page.


Visit

NASA's
Home Page
Link to NASA's home page.

Curator: send mail to curator
Nate James

Responsible Official:
Dr. Joseph H. King, Code 633

Last Revised: Friday, 20-Dec-2002 [NLJ]


Common Data Format (CDF): New XML and Conversion Tools

By David Han

The variety of available data formats (e.g. CDF, netCDF, HDF, etc.) has been a problem for scientists (because data of their interest must be translated into the format they understand before they can analyze data), and it will continue to be a problem for years to come. In a bid to make data format differences transparent to the end users, the CDF office has employed the eXtensible Markup Language (XML) technology and has been developing custom ad-hoc translators to facilitate and promote data interoperability with other data formats.

Once data are described in a XML form, it is very easy to convert one format to another using one of the XML features called eXtensible Style Language (XSL). Since almost all major data formats used within the space science domain support XML nowadays and the ease of transformation of one format to another with XSL, CDF Markup Language (CDFML), a language based on XML, was developed as a mechanism for establishing data interoperability with other data formats. Two tools (CDF2CDFML and CDFML2CDF) have been developed in Java to export the contents of a native CDF file into a XML form (a.k.a. CDFML file) and to create a CDF file from a CDFML file. As a proof of concept, a FITS binary file was saved into a XML file, and a CDF file was created from this XML file without losing any information (through the FITSML-to-CDFML conversion via XSL and the use of the CDFML2CDF tool). The CDF2CDFML and CDFML2CDF tools are available from the CDF home page (http://nssdc.gsfc.nasa.gov/cdf).

Besides XML, the CDF office is also in the process of developing custom ad-hoc translators. To date, HDF5-to-CDF and FITS-to-CDF translators have been developed to facilitate data exchange. An ACE SWICS Level 2 product (stored in HDF) from Caltech has been converted to a CDF file, and the contents of the CDF file is being validated by the ACE acquisition scientist located at the National Space Science Data Center (NSSDC). The FITS-to-CDF translator has been tested against about 35 different data sets from the HEASARC archive, and it is currently in the process of going through a very extensive testing. Upon completion of this testing, a netCDF-to-CDF translator will be developed followed by a CDF-to-FITS translator. These tools are (will be) also available from the CDF home page.

To make these translators "more real," consider the following. In FITS, metadata are described by a keyword (field) and its value. FITS has a set of predefined mandatory and optional keywords that are recommended for use, and each of these keywords has a fixed meaning. If a keyword in the FITS file to be translated is a known keyword (either mandatory or optional) and the keyword is recognized by the FITS-to-CDF translator, the translator software translates the keyword into an appropriate CDF term as a global attribute. If a keyword is not a known keyword (i.e. user-defined FITS keyword), the translator looks up the external mapping file to see whether there's a corresponding CDF attribute name for this keyword. If the mapping file contains an entry (CDF attribute name) for the user-defined FITS keyword, then a CDF global attribute is created using the CDF attribute name found in the mapping file. Otherwise, a CDF global attribute is created using the same FITS keyword name and its value.

The aforementioned translators can be put into action in many different forms. For example, they can be incorporated into a data management system (either centralized or distributed) to translate data on the fly and have the data delivered to the format the end user is familiar with, or they can be used as stand-alone translators on the user's local desktop/system. Data providers sometimes are asked to submit their products into one of the data formats their designated archives support, and this data translation can often be a big burden and very resource-intensive efforts. But this cumbersome data translation can be avoided if these translators are used at the data centers to translate the submitted products into one of the formats they support.

 

return to table of contents
Return to Table of Contents