[FEP LOGO]  

FEP - Format Developer - CDF

David Han
CDF Support Office
 
Comment on this template in the HyperNews Discussion.  

1. History and Philosophy

1.1 Identification

CDF version 2.6.
The underlying model and its architecture are described in the CDF User's Guide that is available on the CDF home page ( http://nssdc.gsfc.nasa.gov/cdf).

1.2 Purpose

The origins of CDF date back to the development of the NASA Climate Data System at the National Space Science Data Center (NSSDC) located at Goddard Space Flight Center (GSFC). It has had the following main requirements driving its development:

  • Facilitate organization and ingestion of data sets into CDF
  • Utilize standard common terminoloy (metadata) to describe data sets
  • Provide utilities for creating, browsing, and mainupulating CDFs.

The programming layer (CDF library) provides the essential framework on which graphical and data analysis packages can be created. The CDF library allows developers of CDF-based systems to easily create applications that permit users to slice data across multidimentional subspaces, access entire structures of data, perform subsampling of data, and access one data element indepently regardless of its relationship to any other data element. CDF data sets are portable across on any platform supported by CDF. CDF is currently available on VAX (OpenVMS and POSIX shell), Sun (SunOD & Solaris), DECstation (ULTRIX), Dec Alpha (OSF/1 *OpenVMS), SGIs (IRIX), IBM RS6000 series (AIX), HP 9000 series (HP-UX), NeXT (Mach), PC (DOS, Windows 3.x/95/98/NT, Linux, and QNX), and Macintosh (MacOS).

1.3 User Community and Sponsoring Organization

CDF has been adopted as the standard format by the International Solar Terrestrial Project (ISTP) program that uses simultaneous and closely coordinated measurements from GEOTAIL, WIND, POLAR, SOHO, and Cluster missions. Additional data from other satellites (e.g. NASA's IMP-8, LANL, GOES, etc.) as well as ground-based observations are used to supplement the data from these missions.

NSSDC provides the necessary funding for the maintenance of CDF and new development.

1.4 Format Evolution

CDF is now in sustaining engineering mode since it is robust and hasn't received any new requirements from the user community. So it doesn't have the formal user working group or committee the CDF office once had when it was developing CDF 2.6 a few years ago. Request for enhancements and new features are received through the CDF User Support office. The wish list is prioritized (due to limited resources) and implemented as much as resource allows.

In September 1998, the CDF office has begun the conversion of the CDF command-line-based tools to Java GUI-based tools. The tools were incrementally developed and made available to the user community via email notification for testing. Those users who wanted to particiate in testing responded. Then the test instructions were sent out to those users who responded and user feedback was received via email in turn. Received user feedback was analyzed, clarfied (if needed), and incorporated into the next version. The CDF office is now in the process of developing CDF Java APIs that are equivalent to the existing C and Fortran APIs to allow users to develop platform-independent CDF applications. Once the code is ready for testing some time this summer, the same strategy described above will be employed.

2. Conceptual Model

An important feature of CDF is that it can handle data sets that are inherently multidimentional in addition to data sets that are scalar. To do this, CDF groups data by "variables" whose values are conceptually organized into arrays. The dimentionality of these variable arrays depends ipon the data and is specified by the user when the CDF or a variable is created. For scalr data, as an example, the array of values would be 0-dimensional (single value); whereas for image data the array would be 2-dimensional. Similarly, the array for volume data would be 3-dimensional. CDF allows users to specify up to ten dimensions. The following a list of what CDF can and can't do.

CDF supports the following:

  • multidimensional data up to 10 dimension
  • variable and file attributes. Attributes and variables can be added/deleted to an existing CDF file.
  • sparse records
  • compression (RLE, GZIP, Huffman, and Adaptive Huffman) on bothe file and variable levels
  • encoding (native and platform-independent Network encoding)
  • decoding (native and platform-independent Network decoding)
  • single file and multi-file
  • caching
  • random and sequential access
  • user-configurable pad value for undefined data
  • data can be stored either in a row-major or column-major order

CDF doesn't support the following:

  • images can be generated using the data stored in a CDF file
  • directory structre (like one supported by HDF)
  • sparse array
  • no tags for different objects

See the CDF User's Guide for more details.

3. Format Details

Relation to Hardware and Media Portability

The encoding of a CDF determines how attribute entry data and variable data values are stored on disk in the CDF file(s). An application program never has to concern itself with the encoding of the CDF being accessed. The CDF library performs all of the encoding and decoding of data values for the application. The encoding specified when creating/modifying a CDF may be any of the native encodings for the computers supported by CDF in addition to the platform-independent network (XDR) encoding. A CDF with any supported encoding is also readable on any computers supported by CDF.

Primitive data types supported. Include internal representation of numbers

  • CDF_CHAR (1-byte, character)
  • CDF_UCHAR (1-byte, unsigned character)
  • CDF_BYTE (1-byte, signed integer)
  • CDF_INT1 (1-byte, signed integer)
  • CDF_UINT1 (1-byte, unsigned integer)
  • CDF_INT2 (2-byte, signed integer)
  • CDF_UINT2 (2-byte, unsigned integer)
  • CDF_INT4 (4-byte, signed integer)
  • CDF_UINT4 (4-byte, unsigned integer)
  • CDF_REAL4 (4-byte, single-precision floating-point)
  • CDF_FLOAT (4-byte, single-precision floating-point)
  • CDF_REAL8 (8-byte, double-precision floating-point)
  • CDF_FLOAT (8-byte, double-precision floating-point)
  • CDF_EPOCH (8-byte, double-precision floating-point)

Magic numbers

CDF Version 2.6 uses two magic numbers. The first one is 0xCDF26002 at file offset 0x00000000 as a 4-byte, unsigned integer with big-endian byte ordering. It is followed by the second one, another 4-byte unsigned integer of 0x0000FFFF for a regular CDF file, or 0xCCCC0001 for a compressed CDF file at file offset 0x00000004.

4. Uses

CDF is ideal for archival, storage, and distribution of any level of science data. Especially with the random access support, time series data can be effciently stored, retrieved, and analyzed. It's also has a feature that allows data subsampling as well. As for the updateability and extendability, one can easily add/delete/modify attributes and data values (variables); it's just a matter of issuing CDF library calls on an existing CDF file.

5. Format Developer Software

A complete list of APIs and their descriptions are documented in the CDF C Reference Maunual and the CDF Fortran Reference Manual that are available on the CDF home page ( http://nssdc.gsfc.nasa.gov/cdf).

As part of the standard CDF distribution package, the following utilities are distributed along with the CDF library.

CDF Tool Name    Description
CDFedit    Allows and/or modification of the contents of a CDF via a full-screen interface.
CDFexport    Allows the contents of a CDF to be exported to the terminal screen, a text file, or another CDF.
CDFconvert    Converts various properties of a CDF.
CDFcompare    Displays the differences between two CDFs
CDFstats    Produces a statiscal report on CDF's variable data
SkeletonTable    Creates an ASCII text file called a skeleton table
SkeletonCDF    Creates a fully structured CDF by reading a skeleton table
CDFinquire    Displays the version of the CDF being used, most configurable parameters, and the system default values.
CDFdir    Displays a directory of a CDF file(s)

6. Software Standard Features

  • Metadata
  • The length of a magic number should be the same, and it should be located at the beginning of a file or at a certain offset in a file.
  • Integer and floating point numbers should be stored in a consistent manner. Also the length of each primitive data type should be the same.

7. Non-developer Software

N/A

8. User Support

CDF has the following mechanisms to receive user's questions, comments, and suggestions:

On average, the CDF help desk receives about 4 or 5 messages per week. The source of user inquiries widely varies, including government agencies, universities, and private and commercial organizations as well as independent researchers.

9. Work In Progress

As stated above, the CDF office is now in the process of developing CDF Java APIs that are equivalent to the existing C and Fortran APIs to allow users to develop platform-independent CDF applications.

10. Evolution Plans

Participate in the Formats Evolution Process Committee (FEPC) and implement the new requirements coming out of this process.

As soon as the CDF Java APIs development is completed during the summer of 1999, sparse array will be implemented that has been requested by several users for some time. Once this is done, enhancements will be made to the CDF Java APIs so that end-users can develop CDF applications in Java as Java applets. This eliminates the need of installing a CDF library on a user's machine and allows users to run CDF applications using a Web browser.

11. Documentation and Related References

CDF home page ( http://nssdc.gsfc.nasa.gov/cdf)

12. Other Comments

[Editors Note: This question was added after form was originally completed.]

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/developer-cdf.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: David Han / CDF Support Office (davidh@xfiles.gsfc.nasa.gov) +1.301.286.3617
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-03-25, David Han (1999-08-04, John Garrett)