NSSDC Finishes First Phase of Document Scanning

Volume 13, Number 2, June 1997
By H. Kent Hills and Ralph Post

Over its 30-year history NSSDC has accumulated a great deal of paper. Some of this paper contains the documentation needed to understand and correctly use many of the data sets in NSSDC's archive. Other paper holdings are more relevant to internal NSSDC operations. The accumulation of paper has led to two significant problems. On the one hand, it is clearly impractical to have data set documentation off line as paper that has to be Xeroxed and mailed to support NSSDC's customers' use of older data made newly network-accessible. On the other hand, storage space constraints were squeezing NSSDC. So NSSDC has undertaken a program to scan much of its off-line paper material to save space and to make the content of that paper network-accessible to customers and to internal staff. Output of this effort has been CD-ROM with TIFF-formatted page images.

In late 1996 NSSDC was able to bring some of its contractor personnel back on site so that all staff members are now on site in Buildings 26 or 28. However, there was not enough room on site for all of their older but still useful information files. This fact provided the impetus for an action that had been under consideration in various modes for some time: digital scanning of the paper files to reduce the storage volume and also make the material machine-readable. Vendor selection was made after viewing demonstration images of the actual paper material, some of which was faint and difficult to read visually. The vendor has completed the scanning, and delivery of the digital images on CD-ROMs was expected near the end of May 1997. In addition to the off-site materials, on-site data catalogs were added to this scanning task, resulting in three categories of hard copy materials to convert to digital images. Data Set Catalogs (DSCs) consisted of documentation to be sent to a requester along with data. Documentation for most older data sets is on paper, although the older data may be on digital tapes. Currently-arriving data are put into a near-line network-accessible system. Staff will soon be moving the off-line digital tape archive to near line. The digitization of the paper documentation converts it to a machine-readable form that can accompany the actual data in the near-line network-accessible system. Some of these documents pertain to multiple data sets and are identified as part of the Technical Reference File (TRF). Thus, a document of significant size is digitized only once but may have reference pointers to it from several DSCs.

The NSSDC has supported a series of ten Coordinated Data Analysis Workshops (CDAWs) since 1978. By putting all of the subject data into a common data base with retrieval, analysis, and display software, these CDAW workshops allowed easier intercomparisons of data from multiple experiments on multiple spacecraft and ground stations for studies of selected events in magnetospheric physics. The data and pertinent descriptive summaries (spacecraft, experimenter, units, etc.) of the data were inserted into common digital data files (Common Data Format [CDF] for recent CDAWs). Detailed documentation and calibration information plus verification plots were all on paper. Maintaining organized files of these materials became a storage problem.

NSSDC acquisition personnel kept a large volume of so-called "Acquisition" files - information about spacecraft and experiments but information that is not usually needed as documentation to accompany data sets. Some of these materials were in file cabinets, but others were stored in boxes and were not readily accessible.

NSSDC needed to

All five goals were met by having a vendor make digital scans of the material and put the results on CD-ROMs. The page images are in TIFF format, and the CDs are formatted for fast user access from a terminal using the Alchemy (Trademark) software package. A separate copy is formatted differently, tailored specifically to facilitate automated transfer into the existing near-line system. Although this digitization process is a first step in that direction, the actual transfer to near-line status for this older documentation is still a future project.

The volume of material scanned corresponded to approximately eight five-drawer file cabinets, or in terms of paper delivered to the vendor 12 boxes of Data Set Catalogs and TRF documents, 12 boxes of CDAW, and 16 of Acquisition materials. All of this will be condensed into a handful of CD-ROMs (about seven) that can be used on an existing PC. Additional CDs will provide work copies and safe backup copies. The returned original hard copy will be maintained on file, be put into deep storage, or be discarded, depending on the specific material.

Return to NSSDC News Table of Contents


Miranda Beall, beall@nssdca.gsfc.nasa.gov, (301) 286-0162
Hughes STX, Code 633, NASA Goddard Space Flight Center
Greenbelt, MD 20771, U.S.A.

Erin D. Gardner, gardner@nssdca.gsfc.nasa.gov, (301) 286-0163
Hughes STX, Code 633, NASA Goddard Space Flight Center
Greenbelt, MD 20771, U.S.A.



NASA home page GSFC home page GSFC organizational page

Author:Miranda Beall
Curators: Erin Gardner and Miranda Beall
Responsible Official: Dr. Joseph H. King, Code 633
Last Revised: 26 JUNE 1997 [EDG]