[FEP LOGO]  

FEP - Format Developer - IDFS

Richard Murphy
SWRI
 
Comment on this template in the HyperNews Discussion.  

1. History and Philosophy

The Instrument Description File System (IDFS) Data Format is a working concept that is based on the desire to define a generic data storage formalism under which the majority of spacecraft science and engineering data can be stored and later accessed through a small set of data access routines. The data file contents should be self-documenting and contain all required instrument characteristics and processing algorithms needed to support high-level scientific interpretation of the data. Based upon past experience with other data sets, the decision was made that data file generation should be at the logical measurement level and not oriented to a physical instrument telemetry stream.

1.1 Identification

IDFS V2.1

1.2 Purpose

The goal in designing the IDFS data format was to develop a multi-satellite, multi-instrument, mission-independent, real-time and non real-time data handling system where the data file system would contain all necessary data needed to make scientific assessments.

1.3 User Community and Sponsoring Organization

The important user community is the space physics community. The funding entities include SwRI and NASA. Note that the IDFS software and format was developed as part of the MO & DA tasks of many different programs. Its development has not been an independent project.i

1.4 Format Evolution

The evolution of the IDFS format is driven by input from the user community, but is managed by SwRI in a way analogous to the LINUX open source model.

2. Conceptual Model

- The focal point of the IDFS data format is the storage of Level 0 data (raw telemetry) and transformation algorithms for unit conversion in separate files. However, processed, theoretical and model data can also be handled easily in a self-consistent manner.

- The IDFS storage mechanism provides more than passive data storage. This is due to the fact that the IDFS API has a built-in algorithmic interpretation capability that allows for quick, real-time derivation of science/engineering measurements as the data is accessed. This provides for the refinement of calibration factors and processing algorithms without having to reprocess the original data set and avoids the storage of many different unit values.

- In the IDFS format, data is classified as either scalar or vector. Scalar data are singular data quantities that are dependent only upon time and position. Vector data are one-dimensional (1-D) data quantities that have a functional dependence on a single variable called the scan variable. The vector data being returned may span the whole scan range or may encompass a subset of the scan range. Note that the term "vector" does not coincide with the definition of a mathematical vector. An example of an IDFS data source that returns a vector quantity would be a particle spectrometer which measures a particle flux (data) as a function of particle energy (scan). Some of the vector IDFS data sources return data that has a functional dependence on dimensions other than the scan dimension. The IDFS paradigm handles data that can have additional dependencies on charge, mass, phi angle and theta angle. When one of these IDFS sources is selected, the data can be reduced to a 1-D vector quantity that is only dependent upon the scan parameter or to a scalar quantity.

- The components of the IDFS Data Format are comprised of three required files and two optional files. The three required files are:

(a) Virtual Instrument Description File (VIDF), (b) Header File and the (c) Data File.

The two optional files are:

(a) Plot Interface Definition File (PIDF) and (b) Special Computation Formulation (SCF) file.

- The VIDF file provides a general description of the measurements being stored in IDFS format. It contains parameters such as fields of view, geometry factors, etc. The VIDF file holds all of the information that is needed by the data access software to interface with the IDFS data and header files. In addition, the VIDF file contains the data reconstruction parameters that are needed in order to transform the raw telemetry into physical units. The VIDF file is the repository for most of the meta-data that is provided.

- The Header file contains data that, for the most part, is slowly varying in time and need not be repeated every data record. This includes instrument characteristics (modes) and timing information. The header file defines the number of sensors being returned in the associated data record and the actual sensor number(s) for which data is returned. The header file defines the number of data samples taken by the sensors. Note that since the header file is used to describe the state of the instrument, header records can have variable lengths. For example, the instrument may change from a mode where 64 samples are returned to a mode where only 32 samples are returned. Therefore, the size of the header record may vary from mode to mode.

- The Data file contains the most rapidly varying data and time stamp information. The data file returns raw, unprocessed binary data. Data is stored within a field with a base length of 8, 16 or 32 bits. The storage boundary used is determined from the VIDF file and is used by the IDFS data access software to correctly unpack the data. Unlike header records, data records do not vary in size. This feature allows for rapid positioning within the data file based upon user-requested start times.

- Data storage in the IDFS data record is organized along the concept of sensor (primary) data, calibration (secondary) data and sensor sets. Sensor data is the basic, primary measurement identifier (object being studied). Calibration data is defined as ancillary data, which is necessary to interpret the primary data (e.g., automatic correction values). These two data matrices, taken collectively, are referred to as an IDFS sensor set. A sensor set defines all of the data returned from a group of sensors over a specified time period.

- The Plot Interface Definition File (PIDF) is an optional file within the IDFS paradigm. Optional in this context means that it is not necessary to have a PIDF file to use the API library to extract data. The Southwest Data Display and Analysis System (SDDAS) does require a PIDF file for each logical instrument, but this is an SDDAS requirement. The PIDF is best described as an interface file between the VIDF definitions and software that is intended to process IDFS data. The PIDF file contains information that describes the type of data contained in the data, header and VIDF file. The types of data that are returned by the IDFS format include sensor data (primary data), scan information, data quality information, variables that describe the state of the instrument (mode), calibration data (secondary data), pitch angle, azimuthal angles, and spin rate. The PIDF file contains information describing how the data can be plotted, e.g. Spectrogram, Line Plot, Contour, etc. The PIDF file describes how to construct the defined units from the VIDF tables. In addition it contains suggested default values to be used when plotting converted data (min/max scaling values), as well as text labels that describe the unit that is derived.

- The SCF system provides for the creation of new data products from an existing primary data set, provided that the data has been stored in the Instrument Description File System (IDFS) format. The SCF file contains a set of algorithms that can be applied to the specified IDFS data parameters in order to derive new data products. These newly derived data products may be dependent upon values returned from a single virtual instrument or may be dependent upon values taken from many virtual instruments. These data products can either be stored as IDFS data sets or can be directly displayed through one of the analysis packages that are written to interface with data that is stored in IDFS format. With the SCF system, specialized tensor operations are defined (supports up to 10-D quantity).

3. Format Details

Relation to Hardware and Media Portability

- IDFS is Y2K compliant since we utilize a 4-digit year value.

Primitive data types supported

- IDFS utilizes user-defined data types to address the issue of porting the source code to multiple platforms. These types are defined as:

SDDAS_INT 4-byte integer SDDAS_LONG 4-byte signed integer SDDAS_2LONGS 8-byte signed integer SDDAS_FLOAT 4-byte floating point SDDAS_DOUBLE 8-byte floating point SDDAS_SHORT 2-byte signed integer SDDAS_CHAR 1-byte signed integer or single character SDDAS_UINT 4-byte unsigned integer SDDAS_ULONG 4-byte unsigned integer SDDAS_USHORT 2-byte unsigned integer SDDAS_UCHAR 1-byte unsigned integer

Describe how software recognizes what format data are in

- The VIDF file is stamped with a version number when it is converted from an ASCII file to a binary file. This version number is checked when the VIDF file is opened and read. If the version number is not consistent with the current revision, the data access software returns a hard-error.

- The PIDF file contains a field that defines the version number of the PIDF file. The version number is verified by the installed PIDF read routine to assure that all fields can be read and interpreted correctly. The PIDF version number is a floating-point value and at this writing, the PIDF version is 2.0.

In the data records, floating point numbers are stored as integers in compressed formats and are uncompressed when converted to physical units. Note that even if the data is to be viewed as raw telemetry, the data should be converted to "raw" units so that the value may be uncompressed if it represents a floating-point number.

Within the IDFS, all values are stored in one of seven data formats:

(1) unsigned integer, binary data

(2) signed integer, binary data

(3) single precision, floating point data

(4) double precision, floating point data

(5) half precision 1, floating point data

(6) half-precision 2, floating point data

(7) half-precision 3, floating point data

The format that is utilized by the data set in question is specified within the VIDF file. The word length for the first two formats listed above is also specified within the VIDF file.

Within the IDFS data record, all floating-point values are stored in internally defined formats that are expanded upon access to the native floating-point format of the computer on which the data is being extracted. There are several common points between the five defined IDFS floating point formats. In each format, the mantissa has an inherent decimal point to the left of the first digit. The exponent controls relative motion of the decimal point from its default location. Both the exponent and mantissa have their sign bits in the most significant bit of their respective bit fields unless otherwise specified.

The single precision floating point data is stored as a 32-bit integer where the mantissa is formed by the most significant 25 bits giving 7 digits of precision (0 to +-9999999). All seven digits are used in the representation of any mantissa. The exponent is located in the least significant 7 bits of the 32-bit word and has a range of +-63. Under these guidelines, 1.57 would be written as a mantissa of +1570000 and an exponent of +1.

The remaining floating point formats have been defined but have yet to be implemented in the IDFS data access routines. The double precision floating point data will be stored as a 64-bit integer where the mantissa is formed by the most significant 55 bits giving 16 digits of precision 0 to +-9999999999999999). All sixteen digits are used in the representation of any mantissa. The exponent is located in the least significant 9 bits of the 64-bit field and has a range of +-255. Under these guidelines, -9.9734 X 10-6 would be written as a mantissa of -9973400000000000 and an exponent of -5.

There are 3 half-precision floating-point formats defined. The difference between them is in the accuracy of the mantissa and the range of the exponent. Exponents differ in their base, being either base 10 or base 2. The latter give a smaller exponent range but better representation of the floating point values being stored.

Both the half-precision 1 and half precision 2 floating point data values are stored as 16-bit integers. They are defined similar to the 32-bit single precision float with the exception that the mantissa is only 9 bits in width. Half-precision 1 uses a base-10 exponent representation and half-precision 2 uses a base-2 exponent representation. The base-10 representation has a larger range with less accuracy, while the base-2 representation has a smaller range with greater precision. The mantissa is formed by the most significant 9 bits with a range of +-256.The exponent is located in the least significant 7 bits of the 16-bit word and has a range of +-64.

The half precision 3 floating point data value is stored as a 16-bit integer with 9 bits for the mantissa and 7 bits for the exponent. The exponent uses a base-2 representation and has the same dynamic range and accuracy as the half-precision 2 floating point data value. The difference between the two is in the storage format. The mantissa is formed by the least significant 8 bits and has a range of +-256. The sign of the mantissa is represented in bit 14. The exponent is located in bits 8-13 and has a range of +-64. Bit 15 represents the exponent sign.

4. Uses

- Archiving

Because of the detailed meta-data that is stored within each IDFS instance, it is ideal for archiving data sets. Even if access to the original data documentation is not possible, the VIDF (and PIDF) provide critical information about the data set.

- Transfer

IDFS data is written in "network" byte order and without using a particular float-point format. With the exception of using the ASCII character set for text data, the files are easily transported between different types machines.

- Low Level Instrument/telemetry

The focal point of the IDFS data format is the storage of Level 0 data (raw telemetry).

- Processing

Many times in the space physics world there arises the need to derive different quantities based upon data parameters that are returned from one or more spacecraft. In some cases, these derived products may be dependent upon values returned from a single instrument. In other cases, the derived products are dependent upon values taken from many instruments. In either case, there is a need to specify data parameters and the algorithms necessary to produce derived data products. To fill this need, a domain specific system has been developed that allows for the definition and derivation of new data quantities from existing IDFS data sets. This system is called the Science Computation Formulation (SCF). By providing a system in which the user dynamically defines the algorithm to be performed to derive the new data products, the need for specialized programming is drastically reduced. In addition, any errors in the algorithm can be quickly corrected without the need for data reprocessing.

- Updateability

There must be at least one VIDF file defined per virtual instrument. If the data within the VIDF changes with time, e.g. calibration coefficients, additional VIDF files can be established.

In addition, the data reconstruction algorithm is easy to change (update). Once the algorithm changes, the entire data set is instantaneously accessible with the new data transformation applied.

- Extendibility

The Science Computation Formulation (SCF) system provides for the creation of new data products from an existing primary data set that is stored in IDFS format. SCF generated data is consistent with the format of the primary data thus ensuring its compatibility with all existing applications of primary data. With the SCF system, a core set of operational drivers is provided. This driver set can easily be expanded to absorb user-defined operations.

5. Format Developer Software

- A set of software services the IDFS format. This covers access of the individual data files through a distributed database, positioning within the data file based on time, access of the data and the real-time conversion of the data to physical units. A similar set of software was developed to provide access to the derived data products defined within an SCF file.

- A set of software was developed to retrieve IDFS data that is time-averaged or sample-averaged. Time-averaged data refers to data that is acquired for a specified time interval. Sample-averaged data refers to data that is averaged over a specific number of data samples.

- A set of software was developed to retrieve SCF derived data products that are time-averaged or sample-averaged. Time-averaged SCF data has the same meaning as time-averaged IDFS data. Sample-averaged SCF data refers to data that is averaged over a specific number of iterations of the SCF algorithm.

There does not now exist a standard library of routines that can be used to create an IDFS instance from data in some other format.

6. Software Standard Features

- We assume standard equals stable and in this context, the set of software that services the IDFS format is 'standard'. This covers access of the individual data files through a distributed database, positioning within the data file based on time, access of the data and the real-time conversion of the data to physical units. SwRI has not published or promoted the IDFS as a standard in itself.

7. Non-developer Software

- (a) ISDAT interface - Pavel Travnicek at Prague University

(b) export IDFS - SwRI has developed an application that takes data that has been stored in IDFS format and generates CDF or netCDF data files.

8. User Support

- User support is a no-cost effort that is web-based, with limited consultation available from the SwRI development group.

- The size of the user community is about 100 users, ranging from scientists to software developers.

9. Work In Progress

- Extending the tensor operations capability to support 10-D SCF data products.

10. Evolution Plans

- (a) Modifying the format of the VIDF file from a fixed format to a parameter-value language format to allow for addition of new fields while maintaining backwards compatibility with older versions of the VIDF file.

(b) Working with instrument teams on new missions to develop the software to handle on-board creating of IDFS files before data transmission to the ground.

11. Documentation and Related References

12. Other Comments

[Editors Note: This question was added after form was originally completed.]

Comment on this template in the HyperNews Discussion.

 

Wider Views

Formats Evolution Process (FEP) Discussion Forums Page
Formats Evolution Process (FEP) Home Page
NASA/Science Office of Standards and Technology (NOST) Home Page

URL: http://ssdoo.gsfc.nasa.gov/nost/fep/developer-idfs.html

A service of NOST at NSSDC.
Access statistics for this web are available.
Comments and suggestions are always welcome.

Author: Richard Murphy / SWRI (richard@swri.edu) +1.210.522.3259
Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
NASA Official: Code 633.2 / Don Sawyer (Don.Sawyer@gsfc.nasa.gov) +1.301.286.2748
Last Revised: 1999-05-24, Richard Murphy (1999-08-04, John Garrett)