Science Archives in the 21st Century
Astronomers are producing and analyzing data at ever more prodigious rates. NASA's Great Observatories, ground-based national observatories, and major survey projects have archive and data distribution systems in place to manage their standard data products, and these are now interlinked through the protocols and metadata standards agreed upon in the Virtual Observatory. However, the digital data associated with peer-reviewed publications is only rarely archived. Most often, astronomers publish graphical representations of their data but not the data themselves. Other astronomers cannot readily inspect the data to either confirm the interpretation presented in a paper or extend the analysis. Highly processed data sets reside on departmental servers and the personal computers of astronomers, and may or may not be available a few years hence. Descriptive metadata, adequate at best for archival collections and associated data discovery services, is often inaccurate or lost once data moves out of pipeline systems into scientists' hand-crafted software.
We are investigating ways to preserve and curate the digital data associated with peer-reviewed journals in astronomy. The technology and standards of the VO provide one component of the necessary technology. A variety of underlying systems can be used to physically host a data repository, and indeed this repository need not be centralized. The repository, however, must be managed and data must be documented through high quality, curated metadata. This curation effort can only partially be automated. Multiple access portals can be available: the original journal, the host data center, the Virtual Observatory, or any number of topically-oriented data services utilizing VO-aware access mechanisms.
I will also briefly discuss metadata management challenges encountered thus far in the implementation of the Virtual Observatory.