Browse the
Archive of Past Articles

Link to NSSDC Archive


Visit
NSSDC's
Home Page
Link to NSSDC's home page.


Visit
SSDOO's
Home Page
Link to SSDOO's home page

 

Visit
GSFC's
Home Page
Link to GSFC's home page.


Visit

NASA's
Home Page
Link to NASA's home page.

Curator: send mail to curator
Nate James

Responsible Official:
Dr. Joseph H. King, Code 633

Last Revised: Friday, 20-Dec-2002 [NLJ]


NSSDC Data Ingest Software Update

By Don Sawyer and Pat McCaslin

 

What's Needed?

The December 2000 issue of the NSSDC Newsletter carried a series of articles describing major data management advances of the NSSDC archive formulated around the concepts contained in the CCSDS/ISO "Reference Model for an Open Archival Information System (OAIS)".  This included descriptions of the new software systems  known as DIOnAS and DMU to support the new functionality.  This article presents a view on some additional functionality needed, and it describes what is being done to achieve this.

A key concept in the evolution of the NSSDC archival system is that of the Archival Information Package (AIP).  The initial implementation packaged one data file, in a canonical form, with a set of attributes to create a new file that could be easily migrated across different media types and that could be used to recreate the original data file, on whatever media was required, without loss of information.  Over the last two years most of the data files on the VMS-based NDADS system, as it was known, have been migrated into AIPs and placed on DLTs for long-term storage and management (see companion article).  Many of the canonical-form data files in these AIPs, along with the AIP attribute objects, have also been made directly accessible to the user community by placing them on NSSDC's anonymous FTP site.

It was recognized early on that an AIP containing only one science data file would not meet all the future needs.  For example, there are some formats, such as the UCLA Flat File format, that involve two complementary files.  Further, when there are many small files each covering short time periods, it is more convenient to manage them in groups covering a larger time period.  Each of these situations obtains for some of the data stored in the NDADS system.  More recently, it has been decided to migrate approximately 1000 older data sets residing offline on 9-track tape and on 3480 cartridges into AIPs for management under DIOnAS.  This will reduce the diversity of management approaches and put online those data deemed to have potential interest to the user community.  This needs to be accomplished efficiently and without information loss.  NSSDC maintains documentation for the data sets, contained on one or more magnetic tapes, usually as the data were originally received.  Therefore it has been decided to package all the files (may be one or many) from each original tape, or the currently maintained image of this tape, into a single AIP.  This will minimize the need to update documentation while preserving the original data values.  These cases establish the need for an AIP containing multiple data files, and for a corresponding attribute object that enables the contained canonical-form data files to be restored on whatever media is appropriate, without loss of information. 

The basic architecture involved is to create AIPs containing multiple files and to pass these AIPs to DIOnAS for archival storage management.  This management includes writing the AIPs to DLT (in files tarred over all the AIP's generated in a multi-tape-input job), splitting AIPs, creating tar files each containing all the canonical-form data files of an AIP and writing these tarred files to anonymous FTP, and tracking the location and status of these AIPs and their components in their various forms (split and non-split).

Software Updates and Tape Migration

To provide the needed software updates for creating multi-file AIPs, it was decided to start with the version of the DMU software called PGU (Package Generator Utility) as this software already supported the creation of AIPs outside of DIOnAS (see the usage of PGU by the IMAGE project ).  The new software, called the Multi-file Package Generator and Analyzer (MPGA), is designed to support all of the functionality of the PGU and the DMU, and it makes heavy re-use of most of their existing modules.  The initial MPGA implementation is, however, focused on supporting the migration of tape files into the DIOnAS environment. 

Major decisions made during the tape migration planning process include: dataset selection, identification of additional media attributes that require preservation, and the definition of the type 3 AIP for holding multiple data files.  The process for moving the offline digital archive to DIOnAS has been defined:

  1. Identify datasets for migration to DLT and the subset thereof destined for nssdcftp
  2. Assemble dataset characteristics from dataset catalogs, and NSSDC databases
  3. Identify individual tape volumes in the IDA database
  4. Read individual tapes, verifying data format
  5. Package data into AIPs
  6. Use DIOnAS to store AIPs on DLT and make them available on nssdcftp

See the attached figure for a visual look at these steps.

[Relative to item 1, all data sets holding unique science content will be migrated to DLT.  Space physics data sets for which NSSDC/SPDF has dissemination responsibility (virtually all NSSDC-resident space physics data) will also go to nssdcftp.]

 

In addition to the MPGA, additional new software and databases are being developed to facilitate the migration. 

The Offline Transition To Online database table (OTTO) contains dataset specific information obtained from dataset catalogs, the JEDS database, and acquisition scientists.  Among the thirty information items in OTTO are data format information items such as file types and machine representations, anticipated media characteristics such as maximum block sizes, and information needed by DIOnAS to identify appropriate nssdcftp destination directories.

A body of software to retrieve data from offline media and provide initial checking is being developed.  Included in this software are procedures to identify individual tape volumes and their contents from information in the IDA database, copy contents of offline tapes to magnetic disk, and compare the anticipated media characteristics stored in OTTO with characteristics actually encountered during the copy process.

The Offline Media Information Table (OMIT) contains information about individual tape volumes.  Every tape volume that the tape read software identifies and processes will have a corresponding OMIT record.  OMIT information includes tape volume-specific information returned by the tape read software as well as some dataset characteristics imported from OTTO.  Each OMIT record contains an Archive Storage Identifier which is the unique identifier of the AIP containing the tape volume’s data.

A database procedure uses information from OMIT and OTTO to generate MPGA “list files.”  These listfiles identify and characterize the data files to be packaged by MPGA in a given processing run and specify the names of the destination AIPs.

These AIPs produced external to the DIOnAS are, upon delivery to DIOnAS, assigned appropriate processing parameters: the nssdcftp directory to hold the AIP’s canonical science data file and the destination DLT to store the AIP.  The DIOnAS mechanism used to assign processing parameters to externally produced AIPs is being modified to facilitate the offline media migration.  Functions stored in the DIOnAS database will be used to construct nssdcftp directory specifications.  Information imported from OTTO will, for a given dataset, identify the destination DLT, identify the appropriate nssdcftp directory naming function and supply the information needed to run that function.

It is expected that the migration of tape data will be in full production by the end of January 2003.

------

 

return to table of contents
Return to Table of Contents