[FEP LOGO]  

FEP - Format Use by an Archive - CDA - FITS

Arnold Rots
SAO / Chandra X-Ray Observatory
 
Comment on this template in the HyperNews Discussion.  

1. Archive Identification

The Chandra Data Archive is to hold all data and data products contained in and derived from observations made with the Chandra X-Ray Observatory, to be launched in July 1999. We expect an annual data volume (compressed, excluding reprocessing runs) of about 500 GB.

1.1 Nature of Ingest Activity

The ingested data products consist of received telemetry and data products derived from that telemetry by automated processing pipelines which push those products into the archive. In addition, several mirror archive sites are populated by the primary archive through automatic replication. The mirror archives receive only a subset of the primary archive's data products.

1.2 Nature of User Community

The CDA strives to make its data products holdings directly accessible and usable by the entire astronomical community. This includes astronomers active in the High Energy Astrophysics field as well as those whose expertise and interests lie more in the areas traditionally served by ground-based observatories.

1.3 Nature of Archive

The CDA is intended to be a permanent archive.

2. Format (Format System) Identification

Apart from certain binary files (raw telemetry), ASCII files (logs), and files in graphics formats (such as PDF, PostScript, and GIF), all data products are archived in FITS format. The FITS format is properly defined in documents referenced at http://fits.gsfc.nasa.gov/fits_home.html.

3. Format Selection Rationale

FITS has been around in the astronomical community for about 20 years and is being used in almost all branches of the field. It is the one format that can be read and written commonly by most major astronomical software packages, including those created by space research organizations, by ground-based observatories, and by commercial software vendors. FITS is adopted as a standard by the International Astronomical Union and has a virtual guarantee of backward compatibility. Because of its widespread use in the community, tools and interfaces dealing with the format are readily available, well developed, and robust. By virtue of this, it facilitates multi-mission and multi-wavelength research projects. Finally, FITS's header-data structure allows the ingest processes to extract metadata from the data products themselves without a need for either separate metadata files or sophisticated expertise on the contents and structure of the data products: a standard set of keywords in the header provides all information necessary to extract the metadata.

4. Roles of Format

Several of the roles of the format formed the rationale for its selection and have been indicated in the previous section. In summary, within the astronomical community FITS allows:

  1. Universal exchange of data; multi-mission and multi-wavelength projects.
  2. Long-term guarantee of support.
  3. Compatibility with existing software.
  4. Cheap extraction of metadata upon ingest.
  5. Maximum accessibility for the community.

5. Data Structures Supported

In the CDA, two basic FITS formats are being used:

  • Multi-dimensional images.
  • Binary tables.

The binary tables are two-dimensional but allow multi-dimensional arrays in their cells. The types of data structures in the CDA include time series data; images (with and without gratings); photon event lists; spectra; and various calibration products.

6. Support

In terms of software support, we have benefitted from a large number of FITS interfaces and FITS-compatible software packages, both in the public and in the commercial domain. This includes interfaces in Fortran, C, C++, Java, Perl, IDL, IRAF, AIPS, AIPS++, FTOOLS, provided by SAO, HEASARC, NRAO, STScI, RSI.

FITS has one significant draw-back: although it provides a well-defined syntax for the metadata, its specification of the semantics of the metadata is very rudimentary. Hence, it guarantees that all FITS readers can read all FITS data products, but not that they can understand them. For Chandra, we have started by adopting the conventions developed by HEASARC for previous High Energy Astrophysics missions, and extended these whenever necessary in the same spirit, in consultation with HEASARC and European mission projects. The conventions are described in a publicly available document.

Ficc, the proposed FITS checksum convention provides support for a reasonable level of data product integrity checks. It is mandated for all Chandra data products.

7. Software

The CDA mainly uses the FITS++ classes distributed with AIPS++ and provided by STScI, for extracting metadata. In addition, some use is made of HEASARC's CFITSIO and FTOOLS for various supporting tasks.

8. Desired Functions

Not surpisingly, our wish list is similar to HEASARC's, though we realize that the absence of certain sophisticated features may well have contributed to FITS's success and versatility. We would therefore rather do without if the addition of such features would threaten the future viability (read: attractiveness to a wide community) of the standard.

  1. It would be good if the astronomical community could agree on a more comprehensive specification of the semantics of the metadata. As it is, though, each mission appears to be building on the conventions adopted by the previous one. As long as this careful balance is maintained, backward compatibility is guaranteed, but more commonality with ground-based observatories' conventions would be extremely beneficial.
  2. It would be useful to have an accepted mechanism to provide references between cells, columns, and header-data units. Some experimentation in this direction has been going on and one has to be careful that such efforts not diverge.
  3. Especially the syntax of the keyword specification is very restrictive (eight character names, fixed record length, fixed column format). There are some limited efforts to allow a certain degree of exchangeability between keywords and columns (in binary tables). But if anywhere, this is the area where one has to tread very carefully in order not to endanger the stability of the standard.
  4. The data types represented in the standard are strongly influenced by the Fortran roots of FITS (as are certain other elements of the standard). As Fortran is losing some ground in the scientific community as the programming language of choice, the absence of unsigned integer data types is especially annoying.

9. Other Comments

It is interesting to note that the AIPS++ table classes very much reflect a synthesis of the basic concept of the FITS binary tables, enriched with the desirable features discussed above.

On the other hand, the FITS working groups (and the IAU FITS Working Group in particular) have chosen for a model where the existing standard has a large moment of inertia. Though that may be extremely irksome at times (and all serious FITS developers have experienced it first-hand), it has also provided a high degree of stability.

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/archive-cda-fits.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: Arnold Rots / SAO/CXC / Chandra X-Ray Observatory (arots@head-cfa.harvard.edu) 617-496-7701
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-07-13T02:06:18, Arnold Rots (1999-08-04, John Garrett)