NSSDC/WDC-SI 2002-01


2001 ANNUAL STATISTICS AND HIGHLIGHTS REPORT FOR THE NATIONAL SPACE SCIENCE DATA CENTER

Joseph H. King
National Space Science Data Center
Greenbelt, Maryland 20771
 



 

Table of Contents

PREFACE

1. INTRODUCTION

2. HIGHLIGHTS

3. SOME SELECTED STATISTICS

4. DATA MANAGED AT NSSDC, AND 2001 INFLOW AND OUTFLOW

4.1. Data Inflow

4.2. Data Outflow

5. ADDITIONAL NSSDC SERVICES

5.1. NSSDC Information Systems

5.2. NASA/Science Office of Standards and Technology (NOST)

5.3. Astronomical Data Center

5.4. Common Data Format

Glossary

Tables 1 - 18

Appendix - Listing of 2001-Published, NSSDC-Acknowledged Papers


2001 Annual Report of the National Space Science Data Center

 

PREFACE

The National Space Science Data Center serves as the permanent archive for most NASA/Space Science mission data to ensure future data accessibility and usability.  NSSDC also provides current data access, complementary to the efforts of other NASA/OSS "active archives," in support of the NASA and international astrophysics and space physics research enterprises.  Finally, NSSDC is a conduit for the general public to acquire NASA space science data (primarily imagery) of interest to them.

NSSDC is pleased to issue this 2001 Annual Report describing (1) the 2001 growth and evolution of NSSDC's data archives, access pathways, and other tools and services, and (2) the 2001 access to those data and services by NSSDC's customer communities.  This report has been made WWW-accessible in the hope that readers will avail themselves of the opportunity to link to the services reported herein.

The scope of this report is that of the traditional NSSDC as defined by the NSSDC budget.  It should be noted that some of the activities thereby supported are the responsibilities of the Astrophysics and Space Physics Data Facilities, organizational peers of the formal NSSDC within Goddard's Space Science Data Operations Office.

I welcome suggestions for user-benefiting improvements to this Annual Report and to NSSDC services.

Joseph H. King

Head, National Space Science Data Center


1. INTRODUCTION

This report characterizes NSSDC's data holdings, metadata holdings, access pathways, and value-added data products, tools, and services at the end of 2001, with a focus on the 2001 activities leading to that end-of-year state.  In addition, this report characterizes the nature and amount of 2001 access to NSSDC's data and services by its many users from various communities.  It is assumed the reader will have a general familiarity with NSSDC and its mission.  The top NSSDC web page is at http://nssdc.gsfc.nasa.gov/ .


2.  HIGHLIGHTS

The most important result of NSSDC’s 2001 activities is the continuing preservation of growing space science data volumes, ensuring their continuing and future accessibility to the space science, education, and general public communities.  The statistics to follow reveal that NSSDC’s archive has now grown to 19.2 TB of space science data and an additional 3.3 TB of Earth science data.  During 2001, 3.3 TB of data were added to the NSSDC archive that now holds data from 1,301 experiments flown on 373 spacecraft.

Next, NSSDC continues to distribute large amounts of data by network to the space science community and general public, and by offline mailings to the general public. 

Again, following statistics detail the data volumes disseminated via various pathways to various communities.  We note here that during 2001, NSSDC’s customers downloaded via network over 3.5 million data files (a 13% increase over 2000) and received about 1.3 TB of data on mailed media.

NSSDC’s data dissemination is leading to the publication of significant new science.  The Appendix of this Annual Report lists 96 science papers acknowledging NSSDC data or services as contributing to their analyses.  These are papers that have come to the attention of our staff.  Most science journals in which NSSDC data or services may have been used are not routinely reviewed by our staff, and several which use NSSDC data/services do not cite such use, so the list represents a lower limit on papers enabled or benefited by NSSDC.

The CDAWeb system that provides access to multi-source data needed in analyses of magnetospheric processes and of solar wind-magnetosphere coupling continues to grow in popularity and usage.  Especially noteworthy in 2001 was the creation of a number of ingest pipelines enabling data from many non-core-ISTP sources to flow directly into CDAWeb rather than flowing from those sources to CDAWeb through the ISTP Central Data Handling Facility which was significantly descoped in late 2001.

The OMNIWeb system had its functionality significantly enhanced.  An option was provided enabling users to generate scatter plots for user-selected science parameters for user-selected time spans, with the ability to filter (include or exclude points) based on user-specified ranges of any OMNI science parameters.  The new option also computes linear regression fit parameters.  A separate new option allows users to link to many of the individual data sets contributing to the multi-source OMNI solar wind data set.

The year 2001 marked the retirement of the NSSDC Data Archive and Dissemination System (NDADS).  NDADS was a VMS-based optical disk jukebox pair which, over its ~10-year life, served over 2.4 million astrophysics (IUE, etc.) and space physics (IMP 8, etc.) data files in response to over 100,000 distinct user requests.

Although started in the second half of 2000, NSSDC truly went into production with its reengineered data management in 2001.  Archive Information Packages (AIP; bundles of data files and companion attribute files as prescribed by the ISO/CCSDS Archive Reference model) were defined, created and written to DLT jukebox.  During 2001, 696,000 such AIPs, containing 660 GB, were created from newly arriving data and from data formerly on NDADS.   At the same time the AIPs’ constituent data and attribute files were written to a unix-based RAID magnetic disk environment for external user access.  During 2002 we will begin the creation of AIP's from the offline digital archive and their ingestion to the nearline DLT jukebox.

During 2001, the first inflow of data from a spacecraft project that used NSSDC-provided software to prepare Archive Information Packages for NSSDC submission was ingested to the DLT jukebox.  The project was IMAGE.  This facilitates NSSDC data ingest and management.  The approach will hopefully be replicated with other missions and individuals preparing data for NSSDC submission.

A new version of the International Reference Ionosphere (IRI) model was released by NSSDC early in 2001 with many improvements, including ionospheric storm effects and a significantly improved equatorial, bottomside electron density profile.  There are several additional output parameters, including vertical ion drift at the magnetic equator. The IRI, evolved by a worldwide community led by an NSSDC staff member, remains one of NSSDC's most often requested software package and is at http://nssdc.gsfc.nasa.gov/space/model/ionos/iri.html .

To aid users in the data preview and selection process, NSSDC developed in 2000 a new family of graphical-display-and-subset interfaces for selected ftp-accessible ASCII and gzipped-ASCII data sets.  This family, called Ftphelper, is at http://nssdc.gsfc.nasa.gov/ftphelper/ .   The number of space physics spacecraft some of whose data are Ftphelper-browsable was brought from six to 14 during 2001.

A new software tool was developed which extracts photometry information (flux densities) from the time-ordered data of COBE's Diffuse Infrared Background Experiment (DIRBE).  The interface allows users to specify the direction to the source of interest and the time span (up to the full 10-month cryogen life) over which source crossing data are to be analyzed.  To date, this tool has been accessed 2007 times.  One resultant publication involved the extraction of Mira variable star IR light curves of unprecedented quality and time and wavelength coverages.

To support possible (but unlikely) future use of old data in a multiplicity of vendor-specific binary formats, NSSDC created a series of web pages documenting the binary representations of words (bit patterns) for 18 vendor-specific formats.  This set of pages is at http://nssdc.gsfc.nasa.gov/nssdc/formats/ .

NSSDC's NASA/Science Office of Standards and Technology (NOST) organized and hosted a 5-day XML workshop under the sponsorship of the Consultative Committee for Space Data Systems (CCSDS). The 24 expert participants made good progress in the technical area of packaging data using XML and in identifying the objectives and need of a continuing working group.   See http://www.ccsds.org/meetings/xml2001summer/papers/ReportOfXMLWG.doc

In 2001, the NASA Sun-Earth Connection Education Forum (SECEF) team, with major NSSDC participation, orchestrated Sun-Earth Day held in April, 2001.  Ten thousand packets of information were sent to teachers, scientists, etc. for Sun Earth Day programs, reaching hundreds of thousands of people.  For programs like this, SECEF received a NASA group achievement award in August, 2001.

Readers are encouraged to exercise the multiple options on the hierarchical array of WWW pages starting with NSSDC's home page.  There are several more functionalities beyond those called out in the preceding paragraphs.


3. SOME SELECTED STATISTICS

(As of 12/31/01 or for 2001)

Volume of data at NSSDC: 22.5 TB
Sources of data: 1301 experiments flown on 373 spacecraft
Distinct data sets: 4,359
Distinct digital media volumes: 66,692
Data volume network-accessible to customers: 1.15 TB
Data volume reaching NSSDC during 2001: 3.3 TB
Data sets with 2001-arriving data: 132

Files downloaded from NSSDC via ftp: 3,534,000
From Photo_Gallery specifically: 2,888,000
Plots made by special space physics systems: 116,000
By CDAWeb specifically: 100,000
Files downloaded from same systems: 71,000
 From CDAWeb specifically: 60,000
Executions of geophysical models: 99,000
Plots/files from orbit services: 47,000
From SSCWeb specifically: 39,000
Number of offline requests satisfied: 919
Numbers of items mailed: 2,241 CD-ROM's, 2,494 photoprints, etc.
Also, 8,700 Milky Way and COBE posters mailed.

Number of hits to NSSDC's web pages: 12.9 million
To NSSDC Master Catalog specifically: 2.6 million

Number of refereed papers published citing NSSDC: >96

4.  DATA MANAGED AT NSSDC, AND 2001 INFLOW AND OUTFLOW

There are several ways to characterize the multi-disciplinary NSSDC archive.  Byte counts are a common metric for modern archives, and will be reported herein.  Numbers of distinct data sets and numbers and diversity of media volumes managed are also very important.  (In NSSDC's terminology, a data set is typically all the data from a given source at a given processing level in a given format.)  The diversity of data sets and of media types relate to the intellectual heterogeneity and technical heterogeneity of the archive, respectively, and we shall report on these also.

At the end of 2001, NSSDC had 4,359 distinct data sets and accompanying documentation packages being managed.  Table 1 indicates the disciplines from which these data sets come and whether the data sets are digital or non-digital (film, etc.).  The table shows that these data sets come from 1,301 experiments that have flown on 373 mostly-NASA spacecraft.  By data set count, space physics is the dominant discipline, accounting for nearly half of NSSDC's data sets.  This reflects the fact that in its early years, NASA launched a preponderance of space physics missions and also that space physics spacecraft typically carry more independent experiments than do astrophysics missions. 

[Astute readers will notice that these counts of data sets and of source spacecraft and experiments are typically smaller than for last year.  This is not because NSSDC has released significant data.  Rather, in this period of heightened sensitivity to good accounting practice, NSSDC developed this year's numbers by a "zero-base" analysis of its information files rather than merely incrementing prior year's numbers with current-year inflows.   Also contributing to the decrement in data set counts is a partially completed new approach to defining multi-source data sets.  For example, the ISEE 1 "data pool" microfilm to which 10 experiments contribute are newly counted as one rather than 10 data sets.]

Note from the table that NSSDC manages almost as many non-digital (mostly film) data sets as digital data sets, although it should be noted that NSSDC has been acquiring almost no non-digital data in recent years and has been gradually converting parts of its film archive to a digital form.

Table 2 is a different characterization of the NSSDC archive, by byte counts and media volume counts.  The table shows 22.5 TB of total data, a 1.15 TB subset that is network-accessible, and 66,692 digital media at NSSDC.  The byte counts are estimates, involving for some data sets assumptions about the mean numbers of bytes on various media types.  The number of media has decreased by 13% since last year as data are moved from low-capacity old media to newer high-capacity media.

Data are also being moved from NSSDC's traditional offline archive to a nearline archive based on a DLT jukebox attached to a unix server.  As described at http://nssdc.gsfc.nasa.gov/nssdc_news/dec00/dec00_toc.html, data are newly archived in "archive information packages" which hold data files and companion attribute files as per the specification of the ISO/CCSDS Open Archival Information System reference model.  Table 3 shows the volumes of data ingested to this new archive, by mission, in 2000 and 2001 (177 GB and 660 GB, respectively).  Much of the data were formerly network-accessible from the retired NDADS system and other data are currently inflowing to NSSDC.  Most of the data were made ftp-accessible in addition.

From the research community's perspective, only astrophysics data and space physics data are network-accessible from NSSDC.  That planetary data are not network-accessible from NSSDC is the result of the Planetary Data System’s making most or all its planetary data accessible via the network or via CD-ROM creation and dissemination.  NSSDC's photo gallery and image catalog which are WWW-accessible from http://nssdc.gsfc.nasa.gov/planetary/  contain much planetary image data but these are largely oriented towards the general public.

Tables 4 and 5 better characterize NSSDC's network-accessible astrophysics and space physics data, by project.  In space physics, NSSDC holds a large volume of CDF-formatted data underlying CDAWeb and a comparably sized separate holding of data in other formats, mostly plain ASCII, and we report these separately in Table 5.  There is very little overlap.  All the data are ftp-accessible.  All the CDF-formatted data are CDAWeb-accessible.  Some of the ASCII data are accessible via Ftphelper, ATMOWeb, or, for the long-wavelength astrophysics data, through mission-specific web pages.

The volume of data network-accessible from NSSDC is seen in Tables 4 and 5 to be 1.15 TB.  This is down from the 2.39 TB reported a year ago.  This drop is associated with the retirement of the nearline NDADS system.  The current magdisk-accessible 1.15 TB may be compared with 0.44 TB of data that were magdisk-accessible a year ago.  Most data that were NDADS-resident a year ago that are not part of the current 1.15 TB are now network-accessible from NASA/astrophysics "active archives," specifically HEASARC at Goddard and MAST at STScI.

Table 6 characterizes the digital media types managed at NSSDC, not including back up copies.  This table is an expansion of Table 2 in which total numbers of unique digital media volumes were given.  It should be noted that most volumes are replicable and have one backup volume.  However, for "CD-ROM (Titles)" which are not locally replicable, NSSDC typically holds between 20 and 200 copies of each title.  For these, NSSDC must replenish stock through a commercial vendor as request activity drives NSSDC stock down.  DLT and DVD are expected to become increasingly important at NSSDC.

Table 7 characterizes NSSDC's non-digital archive, by disciplines by form factor.  This is unchanged from a year ago.   Note that NSSDC has large volumes of non-digital data for each of the discipline areas it supports.  It should be noted, however, that very little new data have been arriving at NSSDC in non-digital form in recent years.  NSSDC has recently begun an effort to systematically convert this film archive to computer-readable form.   During 2001, NSSDC scanned 6,159 film frames from such spacecraft as the Mariner and Gemini series and Magellan, thereby producing about 65 GB of new digital data.

4.1  Data Inflow

Tables 8 and 9 characterize the inflow of digital data to NSSDC during 2001.  In particular, Table 8 shows that NSSDC received approximately 3.3 TB of new data in 2001, via a combination of networks and hard media.  Table 8 shows data volumes by project, with the astrophysics and space physics subsets of ISTP/Wind data attributed to their respective disciplines.  Dominating the counts are media-based HEASARC-provided data, Level-0 data from the FAST, ISTP and IMAGE missions plus data from the FUSE, Mars Global Surveyor and EUVE missions.  Table 9 characterizes the inflowing media types by discipline.  As in recent years, CD-WO media continue as the dominant input media type overall. 

During 2001, NSSDC received approximately 211 GB of data electronically, in addition to the data arriving on the media reported in Table 9.  This 211 GB is included in the Table 8 counts.  The electronic inflow was dominated by ISTP Key Parameter data (109 GB), IMAGE data (49 GB) and data from the ISIS ionogram digitization effort (48 GB), with lesser amounts from a number of spaceflight projects.

By data set count, which as noted earlier marks the intellectual heterogeneity of NSSDC, entireties or parts of 132 data sets arrived at NSSDC during 2001.  Of these, 46 were new data sets, a further subset of which were the first data from 17 experiments.

4.2  Data Outflow

NSSDC provides user access to its data holdings through multiple electronic interfaces and, in addition, through a user support infrastructure for the mailing of offline digital and non-digital data volumes.  Most electronic interfaces are accessible through NSSDC's WWW home page and include (1) special WWW-based interfaces to specific data sets or groups thereof and (2) ftp pathways to a range of data files maintained permanently on NSSDC magnetic disk.  The CDF-formatted data underlying CDAWeb are at ftp://cdaweb.gsfc.nasa.gov/ while all other data are at ftp://nssdcftp.gsfc.nasa.gov/ .

The dominant special WWW-based data access interfaces that NSSDC offers to the research community relate to: ISTP key parameter and a growing range of other space physics data (CDAWeb); the OMNI and uniformized-COHO solar wind datasets (through OMNIWeb and COHOWeb, respectively); various atmospheric and ionospheric data (ATMOWeb); IRAS, COBE and SWAS long-wavelength astrophysics data; and the Astronomical Data Center astronomical source catalogs and journal tables.   In addition, FTPHelper provides a browse/preview functionality for selected ASCII data sets otherwise only ftp-accessible.

The space physics-supportive systems are described at and accessible through http://nssdc.gsfc.nasa.gov/space/ while the astrophysics-supportive systems are accessible through http://nssdc.gsfc.nasa.gov/astro/.  CDAWeb and the below-addressed SSCWeb system are primarily services of the Space Physics Data Facility, while the astrophysics services are provided by the Astrophysics Data Facility.  

The OMNI data set is an NSSDC-created, 38+ year compilation of cross-normalized, multi-spacecraft near-Earth solar wind magnetic field and plasma data and energetic particle data, while the COHOWeb database is a uniformized set of files of NSSDC-merged magnetic field, plasma, and position data for each of many deep space spacecraft.  Table 10 shows annual statistics for the CDAWeb, OMNIWeb, COHOWeb and ATMOWeb systems.  Note the remarkable growth in usage of these systems.  In 2001, they were used by NSSDC’s customers to produce over 700 plots, listings and data files every working day.

Table 11 reports statistics on the usage of NSSDC’s executable geophysical models services and its services for magnetospheric and heliospheric orbits.  The models service lets users specify a model, a spatial point of interest, and any other parameters on which the model depends, and have the model parameters computed at the point or along a profile through the point.  Table 11 shows that there were about 99,000 such computations done by NSSDC customers in 2001, with geomagnetic, ionospheric and atmospheric models dominating.  This almost doubles the 52,000 model computations reported for 2000.  Ftp access to models’ software (95,000 file downloads in 2001) is included in ftp access statistics in Table 12, not in Table 11.

Table 11 also reveals 47,000 orbit computations, a 68% increase over 2000.  Of these, about 84% use the primarily magnetospheric SSCWeb service and the balance use the Heliocentric Ephemerides page. 

A great many NSSDC data sets and other information services are held permanently on magnetic disk for ftp access.  The reader is invited to review all these services from the ftp link on NSSDC home page.  Table 12 gives the annual counts of files downloaded, both overall (over 3.5 million files in 2001, up by 43% from 2000) and for selected directories with high activity.  Note that the Photo Gallery, of high public interest, dominates the statistics with 87% of the total downloads from nssdcftp.  The researcher-downloading via ftp of 201,000 CDF-formatted files from CDAWeb and  155,000 data files from the spacecraft_data subdirectory (more than double the year-2000 number!) shows the high interest in and great value of these services.

WWW access statistics are frequently misleading, insofar as they usually individually count the many files (buttons, etc.) that make up a page.  Nevertheless, growth in WWW accesses is indicative of continuing and growing use of the WWW-provided services.  In 2001, there was an average of 12.9 million hits monthly to NSSDC’s web pages, up by 16% over 2000! 

While the dominant mode of dissemination of data to the astrophysics and space physics research communities is via the internet, NSSDC continues to provide a high level of offline data dissemination.  Table 13 shows that NSSDC responded to over 900 distinct requests for “traditional” products and that NSSDC provided over 8,700 Milky Way and COBE posters to requesters.   This poster count is up by a factor of three relative to 2000! Table 13 also characterizes the user community of NSSDC’s offline services.  To a very large extent it is the U.S. and international general public, the education enterprise, publishers, etc. and their desire for NASA imagery on CD-ROM and as film products that account for most of NSSDC’s offline request activity.

Table 14 gives the counts of requests for offline data sets from various disciplines in 2001, and as integrated over NSSDC's history.  (A small fraction of requests that are multi-disciplinary are double counted in this table.)  Note particularly the dominance of planetary data over both time scales.  This is largely associated with lunar and planetary image data that are widely requested by the general public.  The high level of astrophysical offline activity to a large extent reflects requests by the amateur and professional astronomical communities for ADC catalogs on CD-ROM.  Most offline space physics request activity was for copies of the IMAGE-based “Solar Storms” video tape.

Table 15 shows the most recent 5-year history of NSSDC's offline satellite data request activity by media type.  Several points are noteworthy.  The dominant mode of offline digital data dissemination continues to be by CD-ROM.  It is of interest to note that every working day of 2001, NSSDC mailed about 9 CD-ROMs to 2 requesters.  These numbers are down somewhat from 2000 as more members of the general public are able to access NSSDC’s data electronically.  Also significant in Table 15 is the fact that while requests to NSSDC for film data declined somewhat over the past year, the number of film products mailed was steady (excluding effects of one anomalously large request satisfied in 2000).   Finally, for this report we drop the reporting of magnetic tape dissemination statistics and initiate the reporting of dissemination of videotapes.  The videotapes are as created within GSFC/Space Science Data Operations Office.


5.  ADDITIONAL NSSDC SERVICES

In addition to its archive of scientific data and the variety of data interfaces characterized in the preceding part of this Annual Report, NSSDC offers a number of additional services, which are described in this Section.

5.1 NSSDC Information Management System

The NSSDC Information Management System (NIMS) database now encompasses many of the separate databases that NSSDC has used to track data and information through the years.  The combining of these databases, on an Oracle-dedicated host computer, has yielded improved performance and reliability for NSSDC users and staff.  To aid readers through a transition in terminology, we use a mix of old and new terminology below.

The Automated Internal Management (AIM) database identified virtually all launched spacecraft, the experiments carried by many of those spacecraft, and data sets from those spacecraft primarily as archived at NSSDC.  This database served as the source of information for many of NSSDC's WWW information pages. The NSSDC Master Catalog (NMC) and a number of discipline and project pages retrieved information from AIM and built WWW pages "on the fly" so that the latest information is presented to the user. 

The NSSDC Supplementary Data File (NSDF) was similar to AIM, but tracked non-spacecraft data, multi-source spacecraft or other data, models and programs, and other NSSDC-held data sets that did not fit the AIM spacecraft/experiment/data set hierarchy.

The AIM and NSDF databases have recently been merged into a single JEDS (Java Experiments, Data sets, and Spacecraft) "database" within NIMS that continues as the database underlying the unchanged NMC web page.  JEDS content statistics are reported in Table 16.  Note that JEDS knows over 5800 spacecraft, 5000 experiments and 5000 data sets.  Of this latter number, only a small minority is formerly NSDF data sets; the remainder is from AIM.  Not all NSDF data sets had been moved to JEDS as of 12/31/01.

During 2001, there were 2.56 million accesses to JEDS through the NMC web pages, a 56% increase over 2000.  Of these 70% and 12% were to spacecraft- and experiment-descriptive material, respectively.  Most of these were likely from the general public.

The Technical Reference File (TRF) tracks individual published papers associated with space flight experiments. The NSSDC ID for the experiment is attached to the reference information so lists of papers relevant to a particular experiment can be reported, and/or provided to persons accessing data from a given experiment from NSSDC.  Table 17 shows that 971 papers were newly identified in TRF during 2001 mainly as the result of NSSDC staffers reviewing the Journal of Geophysical Research and the Geophysics Research Letters.  The TRF was used to generate the Appendix listing 96 NSSDC-acknowledged papers published in 2001.

The Java-based Request and Name Directory (JRAND; formerly IRAND) tracks people who have interacted with NSSDC over the years. It includes full names, one or more addresses, telephone and email information, and what NSSDC distribution lists they are on. The database contains approximately 57,000 entries.  This information is also accessed and made available through the PIMS interface on the NSSDC WWW Home Page. Further JRAND statistics are available as Table 18.  JRAND also tracks individual staff-involved requests for satellite and non-satellite data, now more than 82,500 over the years.

The Interactive Data Archive (IDA) is another database of interest. IDA tracks the inventory of NSSDC's digital data volumes (tapes, disks, etc.). IDA had 167,807 records at the end of 2001,  with 3,383 records having been added during 2001.

5.2 NASA/Science Office of Standards and Technology (NOST) at NSSDC

NOST's mission is to facilitate the recognition and use of standards to reduce cost/benefit ratios in the exchange and management of scientific data among NASA entities and the scientific communities they serve.  NOST's Web Home Page is at http://ssdoo.gsfc.nasa.gov/nost/ . The NOST strategy is to play a coordinating role in helping the science disciplines identify new standards requirements. NOST participates in partnerships with them, other agencies, and industry on facilitating the adoption of leading-edge technologies with national or international visibility that can be tailored to meet NASA science information management and exchange requirements, and it assists in the process of moving these technologies toward standards with commercial support.
5.2.1 Consultative Committee for Space Data Systems (CCSDS)

NOST operates NASA's highest level Control Authority office in accordance with the applicable Consultative Committee for Space Data Systems (CCSDS) and ISO standards to formally archive data descriptions for interchange and long term preservation. New registrations were started, and identifiers assigned, for some 26 ISIS data sets being migrated into archival packages on new media.  As of 12/31/01, there were 438 registered identifiers, up from 411 a year ago.

NOST participated in the development of draft CCSDS/ISO standards applicable to multi-discipline and sub-discipline information interchange.  The primary standards and their usage categories were:

Data Entity Dictionary Specification Language (DEDSL overview in pdf format): This standard addresses the problem of providing a standard way to document and exchange the various attributes needed to fully define data elements.  It has been harmonized with the conceptual data element standard from ISO known as ISO 11179 and the ANSI X3.L8 standard known as X3.285. The DEDSL is split into two components - one addressing the conceptual model and one addressing interchange forms.  The conceptual model and an interchange form using the ISO Parameter Value Language (PVL) were completed as CCSDS standards in 2000.  An interchange form using XML with DTDs was completed during 2001 and is in the process of being published by CCSDS.  All three of these standards have also been submitted to ISO for their approval.  These standards support the publication and exchange of data Elements, and groups of data elements, and should lead to more automated access and understanding of data across science disciplines and among organizations.

Reference Model for an Open Archival Information System (OAIS description in pdf format):  This standard provides a conceptual model of a digital archive, including a functional view and an information view, and it provides a framework for discussing migration issues and interactions among archives. The model establishes initial criteria for recognition of a true archival function and should lead to improved archival implementations, provide a basis for further standardization, and provide more cost-effective vendor support.  It has been adopted as a starting point, in addressing digital preservation issues, by an ever growing variety of organizations around the world. The reference model draft underwent formal agency review as both a CCSDS standard and an ISO standard, and was updated in accordance with the comments.  It has undergone a second CCSDS review and has not yet emerged from the second ISO review.  As very few additional comments are anticipated it is expected to become a full ISO standard in the Spring of 2002.  It can be found at http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf.

5.2.2   XML Workshop

During 2001, NOST organized and hosted a 5-day XML workshop under the sponsorship of the CCSDS.  This workshop brought together some 24 experts from the US and international space agencies, academia, and industry to address the application and harmonization of XML efforts in the space domain leading to viable standards.  Good progress was made in the technical area of packaging data using XML and in identifying the objectives and need of a continuing working group.  It was recommended that a formal group be established under the CCSDS to further the harmonization effort.  This action has gone to a vote by the members of the CCSDS Management Council and it appears it will be accepted.  The workshop report is available from: http://www.ccsds.org/meetings/xml2001summer/papers/ReportOfXMLWG.doc

5.3    Astronomical Data Center

In the Astronomical Data Center (ADC) (http://adc.gsfc.nasa.gov/ ), over 3700 astronomical source catalogs and journal tables are maintained online for easy access. Some 700 new data sets were acquired during the past year. Entire catalogs and tables can be retrieved via FTP. Web-based visualization tools (http://tarantella.gsfc.nasa.gov/adf/visualization/design.html ) are available for browsing, plotting, and subsetting the contents of the catalogs and tables before download. Users can query interactively for information on individual plotted data points and search for observations made by NASA missions.

ADC staff members conduct research using the eXtensible Markup Language (XML), and ADC users are benefiting from this effort. During 2001 the ADC established a public XML-based repository of 500 ADC data sets and unveiled the first data access services for this repository. An example of the new services is a form-based metadata search capability that takes advantage of various browse indices (author, keyword, etc.). The XML repository and services were shown to attendees of the American Astronomical Society Meeting in Washington, D.C. during the week of January 7, 2002. Access to the ADC's XML Public Archive is available on the Web at http://xml.gsfc.nasa.gov/archive/.

The ADC’s research activities and data holdings are highly relevant to the development of a National Virtual Observatory. During 2001 ADC staff participated in the Aspen NVO workshop and the NSF-funded NVO Framework project, and gave talks at four universities.

During 2001 the ADC created a comprehensive on-line resource guide called "Data Mining Resources for Space Sciences" (See http://adc.gsfc.nasa.gov/adc/adc_datamining.html )

ADC staff members are developing, as Project AstroData, a pilot series of on-line science education tutorials and exercises for K-12 students. These products are intended to demonstrate the connections between new HST observations and existing astronomical data at the ADC that students can easily access. The AstroData web site is at http://adc.gsfc.nasa.gov/adc/education/astrodata/ .

5.4  Common Data Format

The NSSDC Common Data Format (CDF) is a self-describing data format for the storage and manipulation of multidimensional data in a discipline-independent fashion.  CDF is comprised of three parts, the CDF data files that contain both the actual data values and metadata, the CDF software library that is used to create, access, manage, manipulate, etc. CDF files, and a well-defined Applications Programming Interface (known as the CDF Interface) that provides transparent access to underlying software and data.  The NASA ISTP and IMAGE missions and the ESA Cluster mission use CDF extensively. We also note that CDF underlies NSSDC’s OMNIWeb, COHOWeb, CDAWeb and SSCWeb services.

During 2001, NSSDC's CDF office released CDF 2.7.1 and the CDF Perl Applications Programming Interfaces (APIs).  CDF 2.7.1 contains a more robust CDF library; a built-in installation support for Solaris on PC, Mac OS-X on Macintosh, and Linux on DEC Alpha; and a complete suite of the 7 CDF text-based and Graphical User Interface (GUI)-based tools.  Besides the current C, Fortran, and Java APIs, the advent of the CDF Perl APIs enables a CDF application to be written in popular Perl scripts and run on any one of the Perl-supported platforms without modifications.

In a bid to facilitate and promote data sharing with other data formats, the CDF office developed a HDF5-to-CDF translator and adopted Extensible Markup Language (XML) as a basis for establishing interoperability with other scientific data formats.  The adoption of XML resulted in creation of the CDF Markup Language (CDFML) that employs some of the basic building blocks/objects defined in eXtensible Data Format (XDF) within CDF tags to describe CDF data and metadata.  XDF is an XML-based scientific data format, and it is considered by many to be the most matured Web-based scientific data format available today.

A web page at http://nssdc.gsfc.nasa.gov/cdf/  provides a description of CDF, access to the software distribution, documentation, papers, a list of Frequently Asked Questions, and facilitates interaction with the CDF support group at the NSSDC.

Approximately 14,000 files were FTP-downloaded from the CDF directory of NSSDC’s anonymous account during 2001.  These were mostly files describing CDF, software tools from the CDF library, etc.  In addition, a great many users browse the CDF web pages identified above.


Curator:
Nathan L. James, nate.james@gsfc.nasa.gov, +1-301-286-9789
Code 633, NASA Goddard Space Flight Center
Greenbelt, MD 20771, USA

NASA Official: Donald M. Sawyer, Acting Head, NSSDC
Version 4.0, February 2002
Last Updated: 05-May-2003 NLJ