Science Archives in the 21st Century
Topic: Role and implementation of persistent identifiers
Under the auspices of the Astrophysics Datacenters Executive Committee (ADEC), a standard was developed that allows professional papers to link to original datasets used in those papers, and archived datasets to link to professional papers based on those datasets. The agreement was brokered between the datacenters, the Astrophysics Data System (ADS), and the editors of the AAS journals. It consisted of a standard for the definition of persistent identifiers for the archives' datasets (AIPs) and pairing these identifiers with the "bibcodes" that are in use in the ADS.
The identifiers conform to the International Virtual Observatory Alliance's standard for identifiers (IVOA); i.e., they are placed in the IVO domain, under the authority of the ADS and using the ADS's publishers' designations. The precise definition and meaning of the identifiers is left to the publisher, but the two main requirements are that it will lead the user unambiguously to an AIP from which a DIP can be retrieved, and that the identifier will be maintained in perpetuity, even if the repository were to move. The identifiers are currently being inserted into manuscripts on a modest scale and a harvesting tool is still under development.
Two future developments are of great interest in connection with this standard: certification and association.
The OCLC recently published the Criteria and Checklist for Trustworthy Repositories Audit & Certification (TRAC). It would be valuable for the datacenters represented in ADEC to pursue TRAC status, as this would provide a basis upon which indefinite persistence of the dataset identifiers could be built.
It would be natural for the ADS's bibcodes to also be molded into IVOA identifiers. Consequently, the association between papers and datasets becomes a simple association between identifiers and the harvesting of dataset identifier/bibcode pairs becomes a generalized harvesting of associated identifier pairs. This simple generalization offers immediately a mechanism to link datasets between repositories and to harvest such associations. It has become more and more popular to carry out coordinated or associated observations at multiple observatories (HST, Chandra, Spitzer, XMM, Integral, VLA, MMT, Gemini, AT, etc.), but there is no good mechanism to tie the different components together. Being able to harvest dataset identifier pairs in the manner described here would provide this functionality and thus close a significant gap in information management.