[FEP LOGO]  

FEP - Format Developer - QSAS ASCII File Format

Steve Schwartz (plus Tony Allen and David Burgess)
Astronomy Unit, Queen Mary, London, UK
 
Comment on this template in the HyperNews Discussion.  

1. History and Philosophy

In writing the QMW Science Analysis System (QSAS) software for the UK Cluster Community, we realised that easy data import and export to/from other analysis/graphics tools was essential. Moreover, it is important in such exercises to retain supporting information (metadata) from the original data source. Cluster uses ISTP-compliant CDF files for basic data distribution, with some Cluster-specific extensions. So conversion of such files requires a syntax sufficiently rich to be able to map CDF global and variable attributes, for example.

The only universal exchange environment is ASCII files. The basic elements of the QSAS ASCII file syntax is then an extensible header syntax (using Parameter Value Language) together with a flat or delimited table of values.

Additionally, the import/export routines have been separated to provide a stand-alone translator, Qtran, which operates by filling a c-structure with data and metadata. Translations amongst various file formats is then possible by writing a SINGLE translator which loads/unloads such a structure, rather than separate trnaslators between any two formats.

1.1 Identification

QSAS ASCII File Syntax

1.2 Purpose

To provide easy import and export of data amongst data analysis/display routines, such as QSAS, which retains sufficient metadata to make the resulting file completely self-descriptive. This includes both routinely produced data files (e.g., Cluster Prime Parameters, ISTP Key parameters, etc.) and specific analysis results (fft's, minimum variance, ...). With Cluster in mind, particular attention is paid to the handling of CDF files which contain a rich metadata capability.

1.3 User Community and Sponsoring Organization

QMW Cluster participation is part of the UK Cluster effort and is supported by the PPARC of the UK. More generally, QMW works together with the Rutherford Appleton Laboratory (RAL) to provide the UK Coordinated Data Handling Facility for STP. Products are supplied with appropriate copyright notices, but essentially are freely available.

1.4 Format Evolution

Currently only QMW have been involved in the evolution of the software, with feedback from the user community responded to by us. Future evolution could involve others in the spirit of open software development. We would wish to be informed of any developments, and to be able to incorporate any improvements/extensions.

2. Conceptual Model

    Data Structure
==============
Internally, all data regardless of its source is
loaded into a C-structure together with its
metadata (see details below). From here it can
be accessed by analysis software or written out
to another format, vis:

(data file format x) | | V C-structure | / \ / --> data format y | V analysis program File Syntax =========== Header (attached or detached) ------ File Metadata Global Attributes in CDF-terms File info, lines to skip, ... (to facilitate read of a foreign ASCII file by means of simply writing a header) Variable Metadata - one per variable Variable Attributes in CDF-terms Information to locate (e.g., column location) Vectors, multi-dimensional arrays, etc. are treated as single variables in order to preserve key aspects

Data Values ----------- Flat ASCII table or delimited (e.g., CSV) Time format ISO or "Free Time Format" supported

3. Format Details

    File: ASCII
File Metadata: 
   PVL structure, e.g.: "Project = Cluster"
   Non are mandatory, but file-writing software
      will generate ISTP/Cluster-conformant CDF
      files
Variable Metadata:
   Each variable has a metatdata block of the form:
   "Start_variable = name
    parameter = value
    etc.
    End_variable = name"
   Required entries include:
    Data_type = type   [float, double, ...]
    Sizes = n,m,p,...    [array dimensions]
    Time_format = format  [ISO or FREE_TIME_FORMAT]
       If the variable contains time variable.
       FREE_TIME_FORMAT enables software to construct
         a single time value from pieces in a
         wide variety of formats and orders. This
         is accomplished by supplying a parameter
         TIME_FORMAT_STRING = string, e.g.,
         TIME_FORMAT_STRING = YYYY-MON-DD HH:MI:SS.MSC
         The locations of the individual "key strings"
         such as YYYY, Doy, etc. are then parsed and
         used as the template to read entries in the
         data itself. Fixed offsets can be entered
         directly into the variable metadata block
         using similar syntax, and a Y2K-compliant
         facility allows 2-digit years to be
         resolved.
    Desirable metadata
      We have identified 4 pieces of metadata which
      greatly ease the use of data. These are:

UNITS - a plotting label FIELDNAM - the name of the parameter Frame - to indicate the coordinate system and representation of vectors, etc. Of the form "Frame = vector>gse_xyz SI_conversion - to enable arithmetic conversion to base SI units (m, kg, s, C, K, T, ...) Of the form "SI_conversion = 1.e-9>T" for a magnetic field in nT. The idea is that multiplying, say, an electron temperature and ion temperature by their respective SI_conversion factors should enable a user to add them, regardless of the original units.

4. Uses

The format is well-suited to the ingestion of foreign ASCII data files (e.g., as supplied by various www archives) through the writing of a single (detached) header, to the export of data or analysis results to other users and systems (because of the portability of ASCII), and similarly to the inspection and/or modification of data files with a text editor.

The format, being both flexible/extensible (through the use of PVL) and stylized is useful for machine readibility.

Finally, with translators the format also forms a convenient mechanism for exchanging data in other formats, particularly CDF (for which the translation software already exists).

5. Format Developer Software

Qtran reads/writes QSAS ASCII files (as does, of course, QSAS) and can translate between them and CDF. A Programmer's Manual provides visibility of the internal C-structure and read/write routines.

6. Software Standard Features

The format uses standard PVL and is based on metadata standards laid down by ISTP and Cluster.

The Qtran software is written in C and calls CDF libraries.

7. Non-developer Software

None known.

8. User Support

The file format is described in its documentation, which is held online and kept up-to-date.

Software is provided "as-is", but is maintained presently by QMW Cluster support.

9. Work In Progress

None.

10. Evolution Plans

There are plans to add a binary format in parallel to the ASCII one. Additionally, modules to fill the internal C-structure from data formats other than CDF and QSAS ASCII may/could be added, or to write the contents to other formats.

11. Documentation and Related References

General Information through our homepage: http://www.space-plasma.qmw.ac.uk/
Specific sub-pages include:
SOFTWARE - from which there are obvious links to pages describing QSAS and Qtran
QSAS/CU-QMW-TN-0011/FileSyntax.html - holds the html version of the document defining the ASCII file syntax in full. The Qtran page has links to this and the Postscript version.
QSAS/QIE/DataExchangeMan/DataExchangeMan.html is the Programmer's Manual for the Qtran modules.
DOC/DS-QMW-TN-0003.ps is the definitive reference on Cluster CDF files, and therefore contains complete specification of SI_conversion and other ISTP/Cluster metadata.

12. Other Comments

Keeping the metadata with the data is vital, and ASCII provides a robust mechanism. Compressed ASCII files are comparable in size to their binary counterparts.

Ease of use also requires being able to automatically recognise relationships amongst data values, e.g., that columns 5, 6, and 7 contain the x, y, z components of a vector in GSE cartesian coordinates. The stylised approach described above enables this to be done in a machine-readable way. Additionally, the SI_conversion attribute introduced in the Cluster mission greatly eases combination with other data without confusion or question over what the units are or mean.

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/developer-qsas.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: Steve Schwartz (plus Tony Allen and David Burgess) / Astronomy Unit, Queen Mary, London, UK / Cluster, AMPTE, ... (S.J.Schwartz@qmw.ac.uk) +44 (0)20 7882 5449
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-07-06T12:52:24, Steve Schwartz (plus Tony Allen and David Burgess) (1999-08-04, John Garrett)