NASA Logo, National Aeronautics and Space Administration
National Space Science Data Center Header

Science Archives in the 21st Century

Summary Document for
Science Archives in the 21st Century

A workshop held at the University of Maryland University College Inn and Conference Center, on April 25 - 26, 2007
[MS Word version of this document]

(Body of this document)
(Jump to Summary and Conclusion of this document)


Rapporteur reports on the posters were prepared by Lou Reich, Kathy Fontaine, and Steve Joy.

Rapporteur: Lou Reich - CSC (NASA/GSFC)

Provenance, Production, and Planning - Bruce Barkstrom

  • Framework Areas - Producer, Ingest
  • Maturity - Research Results
  • Key Results- mathematically complete solution to the problem of tracking the complete production history

FRBR in a Scientific Data Context-Joe Hourcle

  • Framework Areas - Finding Aids, Access
  • Maturity - Research Questions, Technology Transfer
  • Key Results -Investigation of integrating DigLib concepts into Science Archives

The Application of Semantic Technologies to Scientific Archives - J. Steven Hughes

  • Framework Areas - Finding Aids, Data Management
  • Maturity - Active Research, Technology Transfer
  • Key Results - Use of Ontologies to describe and enhance a well established data model

Implementing a Virtual Observatory: Models, F/ws and Tools - Todd King

  • Framework Areas - Finding Aids, Access, Federated Architectures
  • Maturity - Framework for comparing PDS and V0 architectures and needs
  • Key Results - A good look at Federated architectures

Whither Physical Media? - Mike Martin

  • Framework Areas - Archival Storage, Dissemination
  • Maturity - Testing Results
  • Key result - What will be or long-tern storage system

Use of AIPs at the NSSDC - Patrick McCaslin

  • Framework Areas - Ingest, Information Model
  • Maturity - Mature Prototype to operational
  • Key Result-Use of AIPs in an operational architecture

Data Preservation Reuse in Archive Design and Implement- Tom McGlynn

  • Framework Areas - Archival Storage, Dissemination
  • Maturity - Survey and Analysis of several archives
  • Key Result- The best analysis of the store or produce-on-demand issue I have seen

Rapporteur: Kathy Fontaine - NASA/GSFC


These posters cross themes - nothing fits solely within one theme
  • Which demonstrates how interconnected the themes actually are.
  • Most touch on some aspect of user needs, though.
All address the spectrum of concerns facing data systems
  • From creating one, to running one, to ensuring preservation of the data base.
  • All are willing to share their lessons learned.
All recognize, at some level, the role of both the data provider (PI) and the user in the survival of the data.

Poster Messages

Crichton: International Planetary Data Alliance
  • Lessons learned in process, data management, and technology are valuable at the national and international level, and should be shared to further data interoperability within the planetary community.
Ebisawa: Scientific Satellite Data Archives at JAXA
  • JAXA has a broad collection of astronomical data bases available, two of which (plasma physics and planetary science) will become key to showing interoperability in those two fields with the Bepi Colombo mission.
Grayzeck: The Role of NSSDC
  • Resident Archives address the need for data to be usable, supportable, and complete when they come in [sometimes to NSSDC] from missions slated for termination.
Guinness: Approaches for Archiving and Distributing Science Data from Planetary Missions
  • It is never too early to get the instrument people involved in the archive process.
McDonald: Replication Policies for Distributed Digital Preservation Environments
  • For selected large, high-value, irreplaceable collections of data, replication may be preferred; a possible method for replication involves Storage Resource Brokers over the TeraGrid network.
James: Show Me the Data
  • In order to stay relevant and useful to your customers (however they are defined), you must understand who they are, what they need, and how they need it.
Zender: Science Archives Over the Past 125 Generations
  • Awareness of the need for long-term preservation is linked to a society.s inherent belief that its citizens have a right to information; the ability to do this long-term preservation is relatively new (only over the past 1.5 generations).

Rapporteur: Steve Joy - UCLA

LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance (Kirk Borne)
Theme: User Interactions, Standards and Technologies

Kirk presents some of the challenges that face the Large Synoptic Survey Telescope (Chile). This project will generate data on the order of 30 TB/day (65 PB/life). The data will be processed and parameterized in real-time such that the meta-data associated with each 6 GB (3 gigapixel) is actually larger than the image itself. All of this data must be moved from its acquisition site to the science center in San Diego. In addition, since much of the analysis and parameterization occurs in real-time, the provenance of the images (telescope configurations, software versions, policies, etc.) need to be tracked along with the data. Kirk discusses how the LSST data management team has addressed these challenges and the database principals that they have employed.

Science Archives in the 21st Century: a NASA LAMBDA report (Paul Butterworth)

Theme: Archival Policies and implementation

Paul runs a small data center (2 FTEs) the serves a small user community, the cosmic microwave background (CMB) scientists. The resources are limited so the data center focuses on the specific needs of its user community. If NASA mandates that his system must comply with various information system methodologies, or that he provide complex metadata for Virtual Observatory access, it may bankrupt or cripple his data center. He makes a strong case for developing the Virtual Observatories and other inter-disciplinary data systems in such a way as to have minimal impact on current data providers and small data centers.

AND Archives: Freeing ourselves from the "Tyranny of the OR" (Ted Habermann)

Theme: Archival Policies and Implementation

Ted presents a case for including both GIS (geospatial) and intrinsically scientific data representations. In the NOAA system, data are initially ingested into a GIS database, and then written back out to the file system in more standard forms. This approach allows users to search, retrieve, and analyze the data using standard geospatial techniques AND to access the data files themselves through standard search and retrieve functions.

An application of CCSDS archival standards to meet both submitter and archive needs during data ingest (Kent Hills et al.)

Theme: Archival Standards and Technologies

Kent describes the past, present, and near future of the NSSDC data ingestion process. The process has transitioned from one that relied on human interaction to one that is standards and tools based with only a human review process. This transition will impact both the Center and its users; however, the end result will be a more useable, more reliable system.

NASA Datasets Management Using Process Libraries and Electronic Handbook [Where Shakespeare meets Freud] (Barry Jacobs)

Theme: N/A

Barry presented a real-time demonstration rather than a poster. The demonstration showed a database of project process that he has assembled in order to assists new projects with the development of their processes. In the past, this information was placed in project databases and not available to other projects. As a result, each project invents new processes to solve problems that have been previously solved by others. By gathering the process information, and distributing making it available to new projects, the workload for new projects should be reduced and eventually many of the common processes may become standardized.

Guidance for Science Data Centers through Understanding Metrics (John Mosses et al.)

Theme: Archival Policies and Implementation???

John presents a poster that describes the metrics that the EOSDIS project has been acquiring. His presentation includes a discussion of some of the pitfalls that arise in the presentation of metrics. In some cases the raw metrics can easily be misinterpreted.

Tradeoffs in the Development in the SPASE Data Model (Jim Thieman et al.)

Theme: Archival Standards and Technologies

Jim presents a discussion of many of the tradeoffs that were discussed and implemented during the development of the SPASE data model. These include, but were not limited to:

       data accessibility:    search/retrieve    vs.   analyze/transform
       documentation level:   file collections   vs.   files
       metadata level:        rich, manual       vs.   minimal, automated
       metadata language:     cross-discipline   vs.   intra-discipline
       metadata content:      conceptual         vs.   structural
                              describes objects  vs.   describes bytes

Return to Workshop Home Page

More Workshop Links

(Non-NASA Site) NASA Logo -