Numeric Listings on Archived Microfilm Converted to ASCII Files

Volume 16, Number 2, June 2000

By Joseph King, George Fleming, and Richard Chu

During its early years NSSDC acquired many non-computer-readable (ncr) data sets (e.g., on microfilm and microfiche) in addition to the many data sets acquired on digital magnetic tape. Most of these ncr data sets are planetary or other images, spectra, line plots, etc., but a significant minority are images of pages of numeric listings, for example, computer printouts. Several of these data sets of numeric listings are of potential scientific value today.

Scanning and optical character recognition technologies have advanced in recent years to the point that it is reasonable to assess and possibly implement the conversion of data sets from numeric listings on microfilm frames to computer-accessible data files.

NSSDC has moved into this area with a previously microfilm-only IMP 8 1970s set of Los Alamos National Laboratory (LANL) magnetotail data. This data set consisted of 54 reels of 16-mm film containing both plots and listings of ion and electron densities, flow speeds and directions, fluxes, average energies, pressure anisotropies and directions, all at about 30-sec resoluton.

The 54 reels and their duplicates were of variable quality (character sharpness, etc.), so three high quality reels were selected and sent to a local vendor for the scanning of all frames and for the optical-character-reading of the 2,589 frames of numeric listings. After about two weeks the film and a CD were provided to NSSDC. The three reels covered the periods 12/13/73- 02/02/74, 02/27/79-04/10/79, and 03/22/80-04/20/80. Data for 45 days in these intervals were digitized.

Display of the new ASCII files revealed that most characters on the film were correctly read and converted. However, a number of errors were readily recognized at NSSDC and were fixed. The incidence of errors not readily recognized appears to be very small (< 0.1%) but difficult to quantify. Fortunately for this data set, certain columns are redundant and can serve as checks; for example, particle density, average energy/particle, and energy density must be consistent as must particle density, flow speed, and flux.

The ASCII files returned by the vendor mapped to the input microfilm frames on a one-to-one basis. Each typically had about 13 minutes of data plus blank lines and column headers reflecting the original computer printout. NSSDC has created a user-effective set of one-day files without blank lines or column headings. In the process NSSDC also moved all the statistical uncertainties in parameters to the ends of records from their locations on the film frames in parentheses immediately following each relevant parameter value. These new files are FTP-accessible from NSSDC at

This Los Alamos magnetotail electron and proton data set taken from the ~ 35 Re IMP 8 spacecraft is a unique and valuable data set. It complements the IMP 8 Low Energy Proton and Electron Differential Energy Analyzer (LEPEDEA) distribution function and moments data set whose FTP-accessibility was announced in the last NSSDC News. There are 11 days in 1979 and 1980 when LEPEDEA data and presently digitized LANL data are concurrently available. (LEPEDEA data have been subsequently made accessible through FTP Helper; interest in the LANL data could lead to its receiving the same treatment.)

Users interested in having NSSDC attempt the conversion of the LANL microfilm data for other time intervals or the conversion of any other NSSDC-held listings-on-microfilm data sets (see the NSSDC Master Catalog) should contact NSSDC.

