|
Using EAST Tools to Validate and Transform Any DataBy John Garrett, Betty Brinker, and Donald SawyerIntroductionLong term archives, such as the NSSDC, typically find themselves in possession of data in many different formats that were created on a variety of different data platforms. Not only is the format of each data set logically different, but the variety of platforms used may result in many different representations of the base data types such as integer and real numbers. One important function for any such long-term archive is to ensure that such data remain available and usable to the community. The continuing development of standards based data description languages and associated tools is providing some new possibilities for archives to efficiently manage, validate and transform their data holdings. The Standard Data Description Language - EASTEAST (not an acronym) is a data description language that supplies complete and non-ambiguous information about the format of the described data. An intrinsic need when describing data is to specify the representation of the interchanged data, including the logical structure of the data and the physical, bit level representation of the individual data items. When allowing for the wide diversity of variables such as the operating systems and the machine representations for numeric values, a full understanding of data can only be reached by using a rigorous notation/language that provides a complete, non-ambiguous logical and physical description. EAST is used for this purpose. EAST is designed for building descriptions of data to be contained separately from the data itself. Note that EAST is a data description language and is not itself a data format. Users of EAST maintain and use their data in whatever formats are most useful to them. EAST is used to describe those various formats without any requirement to change the actual data. EAST was designed with three overriding concerns: data description capabilities, human readability, and computer interpretability. The EAST specification document (available from the CCSDS web site at: http://www.ccsds.org/) provides the syntax and semantic rules for EAST. The EAST specification has been endorsed as an International Standard by the International Organization for Standardization (ISO) and as a Recommendation for Space Data Standards by the Consultative Committee for Space Data Systems (CCSDS). EAST is designed to focus primarily on the physical components and the layout or structure of data while being able to interact with complementary standards and tools that have a stronger emphasis on the semantics, or meaning, of the data. One such complementary standard is the Data Entity Dictionary Specification Language (DEDSL) defined by the Consultative Committee for Space Data Systems (CCSDS). Tools have been designed to incorporate both EAST and DEDSL to more fully describe data in standardized ways. EAST Tool CapabilitiesAlthough EAST is designed to be human readable, truly complete and unambiguous descriptions of complex data record structures can become lengthy and repetitive. This reality provides a motivation to have tools that provide users with several different graphical and text views of the complete set of information carried within EAST and supplemented by DEDSL information. While tools enable users to avoid learning the EAST language syntax, those who wish to do so can, because the language avoids the use of cryptic forms in favor of more English-like constructs. EAST is a formal language and not a natural language: it is a machine compatible (or interpretable) language. The formal nature of EAST allows the control of data descriptions and the interpretation of data in an automated fashion. As shown in the following figure, EAST descriptions are created using the tool OASIS (Outil d'Aide a la Structuration d'Informations Spatiales). In the windows-based environment of OASIS, a user can model the data either before or after it actually exists and can generate files containing the EAST description and additionally files containing the Data Entity Dictionary (DED) conforming to the CCSDS/ISO DEDSL standard. The DED is also a readable and coded (currently PVL - Parameter Value Language or XML) file and contains the semantic description of the data.
A sample view of an OASIS window, where data is being modeled, is shown below. The Data Model Window and the Information Window are displayed in this screen view of OASIS. The diagram in the Data Model Window is constructed primarily by the user's entering syntactic information for each element and node through the Information Window. Highlighting a node or element in the diagram will bring up its descriptive information in the window and provides the means for the user to enter or update further descriptive information. The Main Menu Window is shown at the top left of the screen. This window identifies the selected model by name and lists all of its types, or sub-models, of each element or node. For example, a type may be based on an integer or a floating point representation and may have a particular range. A node usually represents an aggregate structure of data elements, such as a record, array, or list of elements that is delimited by an end-of-file or a marker constant. The element "LATITUDE" is highlighted in the example Data Model Window resulting in the descriptive information for "LATITUDE" being displayed in the Information Window, where changes may also be entered. In the example shown, the Information Window displays the syntactic information. If the "Semantic Information" box at the top of the window is checked by the user, then semantic information will also be displayed in this window. The semantic information portion of the window lists attributes assigned in the Configuration Window and provides a means for the user to give values to these attribute parameters. The "Implantation View" is shown at the screen bottom. This Implantation view may be optionally displayed to show the layout of the structure in which elements such as LATITUDE are contained. A bit length option in this window is selected here, so that lengths of each element field are given in bits and its exact location in the structure is also shown.
Once the EAST descriptions are created either by hand or more likely by use of the EAST OASIS tools, additional tools are available to generate test data and to validate and output existing data. The following figure shows several of the existing EAST tools and their relationships. These tools include: OASIS for creating EAST data descriptions, and three tools that make use of the generated EAST data descriptions, the Data Generator, the Data Checker, and the ASCII Dump. The Data Generator tool creates a test data file in the format described by an EAST data description. The Data Checker tool validates a given data file against a given EAST data description. The ASCII Dump tool interprets a given data file based on an EAST data description and outputs selected blocks of data to an ASCII formatted file.
The EAST tool set has recently added the capability to allow users to create transformations from a data set that is based on a particular machine representation into another rendition of the same data set that is based on a different machine representation. For example, data created for one class of machines, such as that made on the CDC3000 family of machines, can be converted to data interpretable on a Sun architecture. Conversions between other machines, including VAX, IBM, PC, and non-standard architectures, are also possible. The conversion process is shown in the following figure where OASIS is used to create an EAST data description of both the original file format and the desired file format. These two EAST data descriptions and the original file are inputs to the Data Transformer tool. The tool creates a data file in the desired format as an output.
Interfaces to the EAST tool set modules are available in C, FORTRAN, and Ada. We have used C to compile with the EAST library and to form other tools we may want for a particular application. Initial Experience with ToolsThe NSSDC has begun to study the possibility of using the EAST Tools to validate and convert data sets that were originally created on now obsolete systems. As a first step, EAST descriptions of the data sets in their current format are being created. Then an EAST description of a new format, with representations in IEEE form, is created. The EAST tools are used to first validate the content of the data set with its input format and then to transform the data to the new format. This conversion process is helpful for both the data set users and the archive. Users will benefit because they no longer need to create specialized software to read and make use of these data sets. The archive will benefit by the decreased demand on staff time to support users trying to make use of the data sets in those obsolete formats. NSSDC is still in the study phase for this effort, but the initial prototypes dealing with conversion of a single data set seem to be very promising. Since NSSDC is in close communication with the EAST Tool Development Team in the French Space Agency, CNES, the results of this testing will be fed back to these developers. The feedback should result in further improvements in the design of the EAST tools and should suggest updates for increasing throughput when dealing with large sizes and large numbers of data sets. Additional information and support are available through the authors or directly from the CNES EAST Team who may be contacted at east@cnes.fr.
|