NSSDC Makes Major Data Management Advances

By Joseph King

After an extended period of planning, software development and data preparation, NSSDC has begun to implement changes yielding multiple improvements in its ability to make data and supporting material accessible to and usable by researchers and others in the near term and into the far future. The changes identified in this overview article are all further explained in companion articles in this newsletter.

Among the improvements are:

1. Moving "deep archive" digital data from offline storage to storage on a "nearline" Digital Linear Tape (DLT) jukebox.

2. Moving much community-accessible data from a nearline optical disk jukebox (NDADS) to RAID magnetic disk for ftp access.

3. Moving beyond a vendor-specific VMS/Files-11 file management system (NDADS) to a vendor-independent UNIX system.

4. Modeling the NSSDC archive and dissemination environment after the ISO/CCSDS-standard Open Archive Information System (OAIS) Reference Model.

5. Integrating the data previously accessible from NDADS via SPyCAT with other data from the same missions previously ftp-accessible from nssdc.gsfc.nasa.gov.

6. Providing OAIS-compatible data preparation software to a real mission (IMAGE) to facilitate their data preparation and NSSDC's data ingestion.

Among the drivers for these changes were: the approaching obsolescence of NSSDC's optical disk jukeboxes; the need to move data away from vendor-specific systems of uncertain futures (VMS); the ability to exploit falling magnetic disk prices to make a large amount of data ftp-accessible for immediate user access; the need to make NSSDC's archive management more cost effective (via an automated deep archive system rather than a labor-intensive offline operation); and the desire to come into compliance with OAIS for robustness of long-term archiving.

The overall activity has involved the coordinated efforts of NSSDC's acquisition scientists, data operations staffers, and multiple software teams. The integrating software for the establishment of NSSDC's new archive and dissemination environments is called DIOnAS (Data Ingest and Online Access System). DIOnAS depends on information provided to it as "listfiles" prepared by a combination of acquisition scientists and data operations staffers. These listfiles identify and characterize the data to be ingested by DIOnAS in a given job, and also specify the characteristics of the DIOnAS-output products in both the archive and data dissemination environments.

While executing, DIOnAS software calls key data processing modules for (1) the reading of input data, format transformations as specified, creations of attribute records (containing information from VMS/Files-11 extended attribute records, CRC checksums, etc.) for each data file, and the bundling of data files and attribute files into "Archival Information Packages" (AIP, a central OAIS concept) and for (2) the splitting of AIPs into their constituent data and attribute files. These modules, developed by the NSSDC/NOST (NASA Office of Standards and Technologies) software team, are called the Data Migrator Utility (DMU) and the Package Splitter Utility (PSU).

The DIOnAS software itself, created by NSSDC's primary data management software group, controls overall jobs and writes AIPs to a DLT jukebox for permanent archiving and separated files to RAID magnetic disk for user access. In addition, DIOnAS builds an Oracle information base of the locations and other attributes of the AIPs on DLT and data files on magnetic disk.

DIOnAS software run on a Sun Enterprise 3000 computer called "nssdcftp". A 264-slot ATL 2640 jukebox is hosted by nssdcftp and is used for producing and storing the permanent archive DLT tapes. A 1-TB (expandible to 10 TB) Metastore RAID disk array, also hosted by nssdcftp, holds the customer accessible online data.

Most of the data files being transferred from NDADS were readily transferrable to UNIX, with no likely impediments to future usage. However, certain data sets (e.g., those whose VMS files had variable record sizes) had to be converted to one of several NSSDC canonical formats.

Migration of data through DIOnAS started in August of 2000 with the movement of virtually all IMP 8 data from NDADS. Currently a few hundred GB of ISIS ionospheric data are being moved. Several additional space physics data sets are in the queue. Migration of all appropriate data from NDADS will be completed in 2001.

Large volumes of astrophysics data now on NDADS will not be moved through DIOnAS to magnetic disk, as these data are now mostly community-accessible from various NASA/Astrophysics Science Archive Research Centers (SARCs). The exception is IRAS data which will pass from NDADS through DIOnAS. NSSDC-held deep-archive copies of SARC-provided data (and appropriate pre-SARC data) will eventually pass through DIOnAS to NSSDC's deep archive DLT jukebox.

NSSDC will gradually pass all its offline-archived digital data through DIOnAS to the DLT jukebox and, for at least most data not otherwise network-accessible, also to the UNIX/RAID environment for easy user access. NSSDC will also move all the data and services that are now ftp-accessible from nssdc.gsfc.nasa.gov to this UNIX/RAID environment on "nssdcftp." In fact all IMP data have already moved to nssdcftp from both NDADS and nssdc.gsfc.nasa.gov. thereby providing users a simple interface to much IMP data.

Note that the changes discussed above do not relate to the CDF-based CDAWeb or SSCWeb systems. In fact, the CDF-formatted data in these systems are ftp-accessible from the machine hosting those systems (rumba). On the other hand, the OMNIWeb and COHOWeb systems, while mainly CDF-based, access some ASCII data ftp-accessible from nssdcftp, so their software packages and CDF data have been ported from rumba to lewes, a SUN Enterprise 250 tightly coupled (NFS-mounted) to nssdcftp. Top level Web pages for all these systems remain unchanged at their respective URLs.

The SPyCAT interface through which users found and accessed space physics data on NDADS will not be rebuilt in the UNIX environment, at least in the near future. A note to this effect has appeared on the top SPyCAT Web page for months, with almost no user feedback. We believe that the basic spacecraft organization of the directory structure and ftp-accessibility will provide adequate user data finding and access. (User input is always welcome on this.)

It should be noted that selected data sets ftp-accessible from nssdcftp have an extra graphical browse and subset capability from "Ftphelper" running on lewes.

Selected relevant URLs:

New UNIX/RAID ftp environment: ftp//nssdcftp.gsfc.nasa.gov/

Older ftp area being superseded: ftp//nssdc.gsfc.nasa.gov/

NDADS/SPyCAT: http://nssdc.gsfc.nasa.gov/space/ndads/spycat.html

OAIS Reference Model: http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html

Ftphelper: http://nssdc.gsfc.nasa.gov/ftphelper/ftphelper.html end of article mark

Return to NSSDC News Table of Contents

NASA home page GSFC home page GSFC organizational page
Curator: Natalie Barnes
Responsible Official: Dr. Joseph H. King, Code 633
Last Revised: [NAB]