Data Preparation and Transfer Operations

By Kent Hills, Barbara Rowland, and Joseph King

This article is intended to give readers a high level understanding of the operational processes and procedures NSSDC went through in moving data from the VMS-based optical disk jukebox environment (NDADS) to DIOnAS and thence to a UNIX-based DLT jukebox for permanent archiving and to a UNIX-based RAID magnetic disk environment for online user access to data.

The data sets ("data types" in NDADS terminology) to be moved from NDADS to DIOnAS have a variety of formats. Each data set has a record in the NDADS information environment with a number of higher level attributes (e.g. project, data set name, data file_name format, proprietary status, data mode) common to all the files of that data set. In addition, an extended version of these records was generated for each data set, adding further attributes (e.g., new directory and filename structure, binary or ASCII mode, NSSDC ID numbers, documentation pointer IDs) needed in the transfer of data sets through DIOnAS. These new records are called generalized dataset attribute records (GDARs). In many cases, much of the GDAR information was automatically retrieved from existing NSSDC information, but acquisition scientists still had to complete and review them.

At the beginning of the planning process, prior to making the GDARs, a series of meetings was held during which user-optimal conventions for the magnetic disk environment were addressed and standardized. These related to directory hierarchies, numbers of files per directory, file naming, and the like. Thus the new directories generally have a common structure: spacecraft_series/spacecraft/investigation/data_set/time_interval/data_file. Variants occur as judged user-effective in individual cases. Time_interval subdirectories (e.g., 1997,1998) are used when needed to limit the number of files per directory to about 400 or less.

Most file names begin with the start time of the data (YYYY...) to group multiple user-downloaded files by time, assuming the accumulating of many files for individual time-defined events will be the most common user mode. In general files have long and descriptive names. This was done with the expectation that virtually all users will interact with file names via point-and-click rather than by keying them.

In parallel with the preparation of the GDARs by acquisition scientists and their review by NSSDC management, the NSSDC data operations staff was staging data from optical disk platters to VMS magnetic disk to facilitate the migration of these data to DIOnAS. Approximately 100 GB of NSSDC disk capacity was committed to this purpose.

A given DIOnAS data-ingest job involves the processing of a great many files of a given data set into AIPs by DIOnAS/DMU, the tarring of all such AIPs per job into large files for writing to DLT, and the splitting of AIPs into their constituent data files and attribute files for writing to RAID disk. Prior to each such job the data operations staff would create a "listfile" to group the information from the GDAR (common to all files in the data set) with information from the NDADS database on each specific file to be migrated. The listfile information was used in various algorithms that generated specific values (e.g., file names) for the individual AIPs and files produced. An automated procedure to generate listfiles for any future incoming new data files is currently being prepared.

In general, migration of data from optical platters is a relatively slow process. In the first three months of DIOnAS operations, NSSDC has experienced an average rate of 75 KB/s in staging data from platter to magnetic disk. During the first 3 weeks of operation, 5 IMP 8 data sets were migrated from NDADS to DIOnAS, a total of 7.4 GB of data. Subsequent to the IMP data move, some 100's of GB of ISIS data are being moved. In the coming months, data from the remaining 13 space physics missions, plus IRAS, will be moved.

At the end of the data migration from NDADS, the NDADS VMS optical disk jukebox system will be retired. NSSDC's data operations staff will then turn their attention to other data not yet passed through DIOnAS, including data which had been ftp-accessible from NSSDC magnetic disk (in particular, from nssdc.gsfc.nasa.gov/spacecraft_data) and data which are still in the NSSDC offline archives only.

Return to NSSDC News Table of Contents


NASA home page GSFC home page GSFC organizational page
Curator: Natalie Barnes
Responsible Official: Dr. Joseph H. King, Code 633
Last Revised: [NAB]