[FEP LOGO]  

FEP - Format Developer - HDF (a.k.a. HDF4)

Mike Folk
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
 
Comment on this template in the HyperNews Discussion.  

1. History and Philosophy

HDF was first developed in 1988 to provide a portable, self-describing data format for accessing, moving and sharing scientific data in networked, heterogeneous computing environments, including high performance computing environments. HDF was designed to store several different kinds of data objects, and to allow users to mix and group different kinds of data in one file, according to their needs. Being an extensible format, HDF4 has evolved over the years, adding new object types and improvements in the I/O library.

1.1 Identification

HDF Version 4.1 HDF-EOS: HDF files that are organized in a standard way to support earth science data (See also HDF5, which is a different format, and different software is used to access it.)

1.2 Purpose

To provide a portable, self-describing data format for accessing, moving and sharing scientific data in networked, heterogeneous computing environments, including high performance computing environments.

1.3 User Community and Sponsoring Organization

User community: scientists and engineers in virtually every discipline, and some non-scientific applications.

Sponsors: Primarily the National Center for Supercomputing Applications (NCSA), and NASA Earth Science Data and Information System (ESDIS).

1.4 Format Evolution

NCSA developed the original format and library, and continues to maintain and support the basic libraries for HDF4. Because the format is complex, it seems best to have a single source for the library, rather than to have individuals write their own HDF readers and writers. The University of Illinois owns the copyright on the HDF library source code, but the source code and specification is freely available, and others are free to use them for commercial and non-commercial applications.

NCSA controls the evolution, but NASA's ESDIS Project strongly influences this evolution. We also listen carefully to individual users in evolving the format.

HDF-EOS is maintained and supported by the contractor for NASA's EOSDIS.

2. Conceptual Model

At it's lowest level HDF stores primitive objects using a tag structure, which I won't go into here. These primitive objects are organized to represent higher level objects, which are exposed to HDF applications through the HDF API.

The higher level objects are SDS (multidimensional arrays), raster images, color palettes, annotations, and Vdata (tables). Vgroup, a grouping structure, allows applications to mix and group different kinds of data in one file, according to their needs. You can mix and match objects in an HDF file according to your needs.

The SDS, Vdata, and general raster objects can have attributes associated with them. Attributes are of the form "parameter = value," where the "value" can be a scalar or a 1-D array of scalars.

There are options available for how data is stored in HDF. The raw data part of dataset can be compressed in a variety of ways, can be stored as a series of linked blocks (allowing for easy extendibility), can be "chunked" or "tiled" (allowing efficient subsetting access), and can be stored in a separate, external file.

Typically, groups of HDF users who intend to share data will agree on the way they will organize HDF files. An example of this is HDF-EOS, which uses well-defined collections of HDF objects to represent three different types of EOS objects (grid, swath, and point objects), plus metadata.

3. Format Details

HDF is not dependent on any particular medium, , but it only works effectively on random-access media.

Primitive data types supported

Standard types are 8-bit, 16-bit, 32-bit and 64-bit integers, and 32-bit and 64-bit floats. But users can define their own types. Also compound types (record structures) are supported as elements in an one-dimensional array, called a vdata.

Internal representation of numbers

An code is stored with each datatype to describe it, and the library uses this to convert numbers from one type to another.

How software recognizes format data

There's a magic number at the beginning of the file: ^N^C^S^A.

It's too complicated to give other useful details here. See the HDF Specification for more details.

4. Uses

It's best to use HDF as an exchange format for complex collections of scientific data and metadata and/or large datasets. In order to support complicated data and metadata, the format itself is complicated. In other words, if your data structures are simple and you don't need things like efficient I/O, platform independence, etc., you're probably best off using a simpler format.

It is also a good format to use when efficient subetting is required.

Here's our boiler plate list of reasons to use HDF. HDF is useful when you want

  • to share scientific data in heterogeneous computing environments
  • to share scientific data with colleagues in a portable way
  • to use software that understands HDF
  • to store a variety of data types and structures
  • to store and access large data structures that can be extended or obdated
  • to store metadata in a variety of forms an access library that works on a lot of different machines
  • to move the data from machine to machine fast I/O or subsetting
  • efficient storage
  • to use an open standard

HDF is also used for archival storage because there is so much data in HDF. It's not particularly good for archival storage, although its self-describing characteristics are useful.

5. Format Developer Software

NCSA supports an I/O library, utilities for simple things like converting data to HDF and dumping the contents of an HDF file, and a Java-based viewer.

6. Software Standard Features

The default storage of numbers conforms to the XDR standard (IEEE floats, for instance). The HDF4 library is written in ANSI C and Fortran 77. HDF5 is in ANSI C.

7. Non-developer Software

There is a lot of software available from users and vendors for things like data analysis and visualization. For a list, see http://hdf.ncsa.uiuc.edu/tools.html.

8. User Support

User support is free for individual users. When a larger community of users such as EOSDIS and ASCI needs serious support, we ask for funding. Our support staff currently consists of about 1 fte. We handle about 6 requests for support per day.

We also provide an HDF tutorial, and we provide training when requested.

I'm afraid I don't know the size of our user community. Probably in the tens of thousands.

9. Work In Progress

HDF 4.1 release 3 was released in May 1999.

The HDF4 format specification and library specification are being completely revised to strengthen HDF4's viability as an archive format.

We have just resurrected a project involving formal descriptions of HDF files using the Object Description Language (ODL) developed for the Planetary Data System. The original project developed tools for working with HDF-EOS. We now plan to extend those tools to work with HDF4 generally.

10. Evolution Plans

There plans are to evolve HDF4 users to HDF5. This can be a tough problem for many users, so we intend to support HDF4 as long as necessary. There is also some NASA work underway to implement HDF-EOS in HDF5.

11. Documentation and Related References

HDF website: http://hdf.ncsa.uiuc.edu/

HDF-EOS website: http://hdfeos.gsfc.nasa.gov/hdfeos/workshop.html

12. Other Comments

HDF4 help is available at: hdfhelp@ncsa.uiuc.edu

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/developer-hdf4.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: Mike Folk / National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign / HDF (mfolk@ncsa.uiuc.edu) +1 217-244-0647
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-06-22T22:48:12, Mike Folk (199-08-04, John Garrett)