[FEP LOGO]  

FEP - Format Developer - HDF5

Mike Folk
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
 
Comment on this template in the HyperNews Discussion.  

1. History and Philosophy

HDF5 was developed to address intractable inadequacies of HDF4 that were revealed in working with EOS. These included the following needs:

  • To store very large objects (>2 GB)
  • To store largenumbers of objects (>20K)
  • More general, flexible data model
  • More data types
  • Improved I/O performance, especially in parallel environments and in subsetting.
  • To replace HDF4's aging software infrastructure

1.1 Identification

HDF5 Version 1.0 (Version 1.2 planned for 8/99)

1.2 Purpose

See "History and Philosophy"

1.3 User Community and Sponsoring Organization

User community: scientists and engineers in virtually every discipline, and somenon-scientific applications. Sponsoring organizations: Primarily NCSA, the DOE Accelerated Strategic Computing Initiative (ASCI), and NASA Earth Science Data and Information System (ESDIS).

1.4 Format Evolution

NCSA developed the original format and library, and continues to maintain and support the basic libraries for HDF5. Because the format is relatively complex, it seems best to have a single source for the library, rather than to have individuals write their own HDF readers and writers. The University of Illinois owns the copyright on the HDF library source code, but the source code and specification is freely available, and others are free to use them for commercial andnon-commercial applications.

NCSA controls the evolution, but the ASCI and ESDIS projects strongly influence this evolution. We also listen carefully to individual users in evolving the format.

2. Conceptual Model

The data model consists of two basic structures: a grouping structure and a "data object" that is essentially a multidimensional array. The elements in the multidimensional array type can be compound data types, similar to C structures. The HDF5 API is a "lower level" API than that of HDF4. More emphasis is placed on allowing the user to control the storage of objects, the selection of subsets, the definition of datatypes, and other characteristics of a file.

3. Format Details

HDF5 isnot dependent on any particular medium, but it works most effectively on random-access media. It is also possible to add i/o drivers to HDF5 to write to alternate destinations, such as memory or anetwork.

Primitive data types supported

Standard types are 8-bit, 16-bit, 32-bit and 64-bit integers, and 32-bit and 64-bit floats. Also supported are variable length types, user-defined types, and an opaque type. Also compound types (record structures) are supported as elements in an array.

Internal representation ofnumbers

A full description of hownumbers are represented is stored as part of a dataset, and the library uses this to convertnumbers from one type to another.

How software recognizes format data

The header for a dataset contains a description. See http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html#DataTypeMessage for details.

The file "boot block" begins with a signature consisting of the following hexadecimalnumber "89 48 44 46 0d 0a 1a 0a".

Note that the boot block maynot be at the beginning of the file. The boot-block is located by searching for the HDF5 file signature at byte offset 0, byte offset 512 and at successive locations in the file, each a multiple of two of the previous location, i.e. 0, 512, 1024, 2048, etc. This feature makes it possible for users to add their own information at the start of an HDF5 file.

Another feature worthnoting is that the first part of an HDF5 file can consist of information provided by a user. The library looks at the first

4. Uses

It's best to use HDF5 as an exchange format for complex collections of scientific data and metadata and/or large datasets. In order to support complicated data and metadata, the format itself is complicated. In other words, if your data structures are simple and you don'tneed things like efficient I/O, platform independence, etc., you're probably best off using a simpler format. It is also a good format to use when efficient subetting is required. Here's our boiler plate list of reasons to use HDF5. HDF5 is useful when you want

  • to share scientific data in heterogeneous computing environments
  • to share scientific data with colleagues in a portable way
  • to use software that understands HDF5
  • to store a variety of data types and structures
  • to store and access large data structures that can be extended or obdated
  • to store metadata in a variety of forms
  • an access library that works on a lot of different machines
  • to move the data from machine to machine
  • fast I/O or subsetting
  • efficient storage
  • to use an open standard
HDF5 is also likely to be used for archival storage because there will probably be a lot of data in HDF5. It'snot particularly good for archival storage, although its self-describing characteristics are useful.

5. Format Developer Software

NCSA supports an I/O library, utilities for simple things like converting data to HDF and dumping the contents of an HDF file. Some kind of Java-based viewer is in the works.

6. Software Standard Features

Numbers can be stored according to the XDR standard (IEEE floats, for instance), but that isnotnecessary. The HDF5 library is written in ANSI C.

7. Non-developer Software

Very little at the moment. There is a converter that converts HDF5 files to HDF4, making it possible to use HDF4 software to view some HDF5 files.

8. User Support

User support is free for individual users. When a larger community of users such as EOSDIS and ASCIneeds serious support, we ask for funding. Our support staff currently consists of about 1 fte. We handle about 6 requests for support per day.

We also provide an HDF5 tutorial, and we provide training when requested.

So far the user community is small. Probably a few hundred, but it is expected to grow substantially as it replaces HDF4.

9. Work In Progress

HDF5 was released in November 1998, but is still having quite a few features added. Anew release is planned in Summer 1999.

We are working hard on formalizing the HDF5 data model and building tools based on more formal approaches. For instance, we have a Backus-Nauer Form (BNF) description of a data description language (DDL) of HDF5, and are working on fully specifying an HDF5 object model in UML. One tool exists for dumping the contents of an HDF5 files in the DDL form. Tools are being written using Lex/YACC to take a DDL and generate an HDF "compiler," and also to convert a DDL description to XML.

We are working with the university of Wisconsin's VisAD project to make VisAD a standard tool for accessing HDF5 files.

10. Evolution Plans

There plans are to evolve HDF4 users to HDF5. This can be a tough problem for many users, so we intend to support HDF4 as long asnecessary. There is also some NASA work underway to implement HDF-EOS in HDF5.

A thread-safe version of the HDF5 library is planned for late 1999.

11. Documentation and Related References

HDF5 Format spec: http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html

HDF5 website: http://hdf.ncsa.uiuc.edu/HDF5/

12. Other Comments

HDF4 help is available at: hdfhelp@ncsa.uiuc.edu

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/developer-hdf5.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: Mike Folk / National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign / HDF (mfolk@ncsa.uiuc.edu) +1 214-244-0647
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-06-22T22:47:41, Mike Folk (1999-08-04, John Garrett)