Science Archives in the 21st Century
The typical sequence of events in data archiving is with a mission getting flown; the data being returned to and analyzed by the team; the data being cleaned up; and finally, the data being archived. It is our belief that this chain of events is fundamentally flawed since the act of archiving data is often not an integral part of the mission. Furthermore, the people most knowledgeable about the data have generally moved on to the next mission and the archiving of data is an afterthought, if even thought about at all. Cleaning and validating data are costprohibitive as well as resource intensive in terms of labor and computing hardware. If done at the end of missions, money is often so tight that there is little incentive to continue with an archive. Furthermore, space physicists have their own ways of handling data resulting in difficulty for interested observers outside the mission.
These dilemmas can be resolved by making the data used by the instrument team identical to the one that is to be archived. To achieve this, we must develop an ideal data format that is: usable by an instrument team, usable by an archival site, and usable by future scientists analyzing the data. In our framework terminology, what the provider uses for their needs should be identical to what is packaged as a submission information package (SIP) and submitted to archival storage.
Standardizing on a single data format can be achieved by first agreeing to a standard meta-data such as what the SPASE group is doing. The next step is for the space science community to develop a new standard data storage format to which others can agree to use.
The advantages of this approach allow others to develop value added services such as what the virtual observatories are doing as well as allowing instrument teams to continue creating valuable data products throughout the life of the mission that are immediately able to be archived. With this approach, the data is immediately usable to both the instrument team and the general space science community, and will continue to be useful past the life of the mission.
Every instrument group is already defining and distributing data products. However, using a standards based approach, more value added services will become available to the community, thus making the incentive to work with standards easier. As a result, there will be a rich set of tools available to aid the data investigator and data manager alike.
The real challenge is to define what this ideal data storage format would look like. At SwRI, we have developed a data storage format and know some of the challenges of developing a generic data storage format. We know now what we would do differently and what types of things we feel have been done correctly. We plan to discuss what steps can be taken in the future to ensure quality data archives for the entire space science community.