New Process for Influencing Formats Evolution and Adoption

Volume 14, Number 4, December 1998
By Donald Sawyer and Joseph King

Significant benefit has accrued to various science communities from the use of standard formats within those communities. Examples include the worldwide astronomical community's use of the Flexible Image Transport System (FITS), the International Solar Terrestrial Physics (ISTP) community's use of the Common Data Format (CDF), and the NASA Planetary community's use of Planetary Data System (PDS) Labels.

However, scientific progress continues to be impeded within some science communities and across the boundaries of traditional discipline domains by the lack or excessive multiplicity of available standards for data formats and structures. One adverse effect of this situation is the extra effort required to understand and analyze data from multiple disciplines or subdisciplines. This extra effort applies not only to researchers but also to archives that must preserve and make the information available in forms understandable to its customers over the long-term. Another adverse effect is that the commercial software sector is unable to perceive what formats/structures it should support (i.e., with applications software) leading to the situation whereby developers and/or users of the many available "standard" formats must develop/maintain/evolve tools that the commercial sector might otherwise do more cost-effectively. These situations, coupled with the advent of the Web and the continuing emergence of new, widely supported technologies such as Extended Markup Language (XML), suggest it is time to take a new look at the format issues and evolution options.

To address this as it impacts in particular the NASA/Office of Space Science (OSS) domain, the NASA/Science Office of Standards and Technologies (NOST) at the National Space Science Data Center is planning to coordinate a Formats Evolution Process (FEP). This process, to be guided by the Space Science Data Systems (SSDS) Technical Working Group and by a technically-oriented Process Working Group, initially has two time horizons, one on the scale of a year or less and the other on the scale of a few years.

On the scale of a year or less, the focus will begin on the needs of the OSS/Sun-Earth Connections (SEC) community where formats multiplicity is a significant data exchange and analysis impediment. Reviewed will be the SEC data management requirements and the levels of success of extant standard formats ( [IDFS], Common Data Format [CDF], netCDF, [Hierachical Data Format [HDF], etc.) and their accompanying software in satisfying the various requirements. A perspective will be provided on emerging technologies (e.g., XML, Common Objects Request Broker Architecture [CORBA], etc.) that have the potential to impact format usage in the near future. From the effort at this scale, the objectives are(1) more coordinated and complementary development work on the part of the groups responsible for the various formats in the near term, and for (2) a consensus set of recommendations to new projects needing to make formats selections for various object types and for data at various levels of processing in the time frame of one to three years hence.

On the scale of a few years, proposed is a review of what "data object types" (images, spectra, time series, etc.) and accompanying metadata have to be managed and scientifically used and what relatively common functionalities need to be associated with such management and use. The objective at this scale is to enable (through consensus on requirements) a coordinated evolution of then-current format standards and associated data management and analysis tools. The scope for this effort includes the NASA/OSS and related communities, the commercial sector, and possibly non-NASA scientific communities with similar needs. Work on standardizing interfaces (APIs) to various scientific data types/objects will pay substantial dividends over the long term. There may be other approaches that emerge at this stage, or there may be insufficient common ground. However, the time is right to take a fresh look. The emergence of a concurrence on formats/software requirements is likely to stimulate development activities on the part of commercial vendors. This development should yield a decrease in the cost to government sponsors of ensuring that their intended customer communities have needed functionalities.

This Formats Evolution Process is beginning with the development of WWW pages to serve as a discussion medium for interested parties. Pages will be structured to address key topics with solicitations of white papers and comments. Topics are expected to include descriptions on the scopes, technologies, and functionalities of various standard format packages (formats, types of data managed, accompanying metadata, enabling software, etc.) from key representatives of formats development groups. Collected will be statements of successes and shortcomings from various users of these standard-formats (ranging from spaceflight projects through individual scientists). It will be important to elicit visionary statements on the functionalities to be delivered by emerging generic standards, future standard formats, accompanying software, and other generic data management, browse, and analysis software. Finally, draft statements will be encouraged on recommended pathways to the future based on the developer-specific and user-specific input and the future vision statements. This input should stimulate maximal participation, critical thinking, and continuing dialogue. As the issues firm up, and primary impediments to usability and interoperability become clear, they will be abstracted into draft comparative-overview statements by the NSSDC/NOST staff. The URL for this set of pages is not fixed yet, but it will be identified in the "What's New" area of NSSDC's home page at

At some point in the early months of 1999, this process will be ripe for a workshop at which consensus may be reached on the best next steps to take relative to formats/functionalities evolution and adoption that will best promote data interoperability and usability within the OSS/SSDS science data environment. This workshop will involve format users, both OSS and non-OSS format developers, and relevant vendors.

Following the workshop the Process Working Group will assess the progress obtained and the prospects for achieving increased cost-effectiveness in defining additional FEP activity. The authors will be pleased to receive comments on this process and planned approach.

