|
Common Data Format (CDF): New XML and Conversion ToolsBy David HanThe variety of available data formats (e.g. CDF, netCDF, HDF, etc.) has been a problem for scientists (because data of their interest must be translated into the format they understand before they can analyze data), and it will continue to be a problem for years to come. In a bid to make data format differences transparent to the end users, the CDF office has employed the eXtensible Markup Language (XML) technology and has been developing custom ad-hoc translators to facilitate and promote data interoperability with other data formats. Once data
are described in a XML form, it is very easy to convert one format to
another using one of the XML features called eXtensible Style Language
(XSL). Since almost all major data formats used within the space science
domain support XML nowadays and the ease of transformation of one format
to another with XSL, CDF Markup Language (CDFML), a language based on
XML, was developed as a mechanism for establishing data interoperability
with other data formats. Two tools (CDF2CDFML and CDFML2CDF) have been
developed in Java to export the contents of a native CDF file into a XML
form (a.k.a. CDFML file) and to create a CDF file from a CDFML file. As
a proof of concept, a FITS binary file was saved into a XML file, and
a CDF file was created from this XML file without losing any information
(through the FITSML-to-CDFML conversion via XSL and the use of the CDFML2CDF
tool). The CDF2CDFML and CDFML2CDF tools are available from the CDF home
page (http://nssdc.gsfc.nasa.gov/cdf).
Besides XML,
the CDF office is also in the process of developing custom ad-hoc translators.
To date, HDF5-to-CDF and FITS-to-CDF translators have been developed to
facilitate data exchange. An ACE SWICS Level 2 product (stored in HDF)
from Caltech has been converted to a CDF file, and the contents of the
CDF file is being validated by the ACE acquisition scientist located at
the National Space Science Data Center (NSSDC). The FITS-to-CDF translator
has been tested against about 35 different data sets from the HEASARC
archive, and it is currently in the process of going through a very extensive
testing. Upon completion of this testing, a netCDF-to-CDF translator will
be developed followed by a CDF-to-FITS translator. These tools are (will
be) also available from the CDF home page. To make these translators "more real," consider the following. In FITS, metadata are described by a keyword (field) and its value. FITS has a set of predefined mandatory and optional keywords that are recommended for use, and each of these keywords has a fixed meaning. If a keyword in the FITS file to be translated is a known keyword (either mandatory or optional) and the keyword is recognized by the FITS-to-CDF translator, the translator software translates the keyword into an appropriate CDF term as a global attribute. If a keyword is not a known keyword (i.e. user-defined FITS keyword), the translator looks up the external mapping file to see whether there's a corresponding CDF attribute name for this keyword. If the mapping file contains an entry (CDF attribute name) for the user-defined FITS keyword, then a CDF global attribute is created using the CDF attribute name found in the mapping file. Otherwise, a CDF global attribute is created using the same FITS keyword name and its value. The aforementioned translators can be put into action in many different forms. For example, they can be incorporated into a data management system (either centralized or distributed) to translate data on the fly and have the data delivered to the format the end user is familiar with, or they can be used as stand-alone translators on the user's local desktop/system. Data providers sometimes are asked to submit their products into one of the data formats their designated archives support, and this data translation can often be a big burden and very resource-intensive efforts. But this cumbersome data translation can be avoided if these translators are used at the data centers to translate the submitted products into one of the formats they support.
|