Dear Colleague:

Enclosed are the proceedings from the Open Meeting on Space Science Data
Systems held last week here at NASA Headquarters.  I appreciate your
participation and interest in this very important topic.  I look forward to
your continued involvement as we proceed toward a more coherent federation
of space science data systems.  In the meantime, I welcome any comments and
thoughts, either via e-mail to joe.bredekamp@hq.nasa.gov, or phone 202-358-2348.

Thanks again.

Joe Bredekamp


================================================================
Joseph H. Bredekamp                   joe.bredekamp@hq.nasa.gov
Senior Science Program Executive/     Voice: 202-358-2348
     Information Systems              Sec:   202-358-1588
NASA Office of Space Science          Fax:   202-358-3097

Open Meeting on Space Science Data System

NASA Headquarters Auditorium
March 27, 1997

Agenda

 9:00   Introduction                            Joe Bredekamp
 9:15   Space Science Vision                    Henry Brinton
 9:45   Synopsis of Current Data Environment    John Nousek/Penn State
                                                Ray Arvidson/Washington U
                                                Tim Killeen/U. Michigan
10:45   Data Management Task Group Report       Jeff Linsky/U. Colorado
11:15   SSDS Concept and Approach               Joe Bredekamp
12:00   Lunch
 1:00   Plenary Discussion
               Management Issues
               Technical Issues
               Transition Issues
 4:00   Summary Statements
 4:30   Adjourn

Meeting Proceedings

Introduction

Mr. Joe Bredekamp welcomed the attendees to the open meeting on the Space Science Data System. The purpose of this meeting was to involve government, academic, and industry partners in the planning process for improving the data archive system for space science. The meeting was structured to be both informative and interactive.

Space Science Vision

Dr. Henry Brinton, Director of the Research Program Management Division in the Office of Space Science (OSS), presented the vision of NASA's Space Science Enterprise. The technical challenges and opportunities for space science have never been greater. A confluence of events over the past few years, and particularly the last twelve months, has changed the prospects of the future of the space science mission. This mission is supported by the NASA Administrator, the White House, and Congress. The unified mission of the Space Science Enterprise is to explore the solar system and universe searching for evidence of life beyond earth, looking for planets around other stars, and examining the origin, evolution and destiny of the universe, galaxies, and stars. The OSS is organized along four basic themes, with an overarching theme of the origin and distribution of life in the universe. This overarching theme follows the long chain of events from the birth of the universe at the Big Bang, through the formation of galaxies, stars, and planets, to the evolution of life itself. Dr. Brinton described some of the recent scientific discoveries that have contributed to this theme. Within the next 20 years, a space interferometer, such as the Planet Finder, may be able to separate light to determine the existence of other planets, and the atmospheric and surface conditions of those planets. The Origins Initiative is a set of augmentations in the FY 1998 President's Budget that is consistent with the President's new Civil Space Policy. In response to a question, Dr. Brinton noted that there has never been a time when there has been such universal support for space science. The NASA Administrator and the White House is in full support of this program, and there is bipartisan support in Congress for the space science budget.

Synopsis of Current Data Environment

The data structure is a critical part of the space science program. Several individuals from the science community gave their perspectives on the current environment.

Dr. John Nousek from Pennsylvania State University described what a science user expects from astrophysics data today: free; available on request in any volume; convenient to use indexing, browse and location tools; prompt data delivery; usable and understandable support analysis tools, especially the software to process and interact with the data; freedom to select alternative forms; exchange standards; user supplied software extensions; re-processing/re-calibration available on demand; immediate correction of software/data errors; and availability of detailed expertise. An important concept underlying astrophysics data is the data/software life cycle. It starts out before launch with calibration data. After launch there is an early (performance verification) phase, where the data is proprietary. After this first testing phase, there are general observations, still proprietary, but guest investigators start to use the data. This evolves into the open time, where there is a mixture of proprietary data and data going into the archive. When the mission becomes mature, there is new data coming in and data going into archive, and re-processing of the data set occurs. This is usually very time consuming and expensive. After the end of the mission (approximately one year), all of the proprietary rights have disappeared, all data is archival, and archival tools are essential. Somewhat later there is final reprocessing of the data, which goes into the archive. After this time is the extended archive phase (science archive research center) and a new set of people take over responsibility for the data. After the computer genre has evolved, data reclamation and reformatting is necessary. Finally, there is a historical role to preserve data integrity. Dr. Nousek presented some lessons learned from his experience: dramatic chance can occur suddenly and even the best projects can become irrelevant; agility is essential to minimize the planning/managerial overhead; no progress occurs until the first use of the system; technology is always astounding, and no project is ever technology bound. The point of data/software is scientific discovery, and development/resources must follow the user patterns. The data environment empowers scientists to make discoveries.

Dr. Ray Arvidson from Washington University continued the discussion from a planetary sciences perspective, both data user and producer. In response to a Committee on Data Management and Computation (CODMAC) report in 1982 and 1984, the Pilot Planetary Data System (PDS) was formed to prototype ideas and begin serving the community. The best data systems are those that have direct science involvement. In 1991, the PDS became operational, and serves the community today. It has archived data from about 20 missions and currently works with 11 active missions. The PDS publishes and distributes peer-reviewed data from completed missions; establishes planetary archive standards; works with flight projects to ensure generation of PDS compatible products; and provides science expertise on the archived data. There is a central node at JPL for system management, but the PDS is fully distributed into the community. A process has been established for archiving that directly involves the PDS with the mission or the community members. A set of standards or approaches has been developed that are accepted by the community and are used by the missions and communities. Peer review is an important part of the entire process. The current system is discipline-oriented, and it is not set up to cut across the traditional discipline boundaries. The SSDS, properly done, will enable theme-based research to be accomplished. However, in moving to SSDS, Dr. Arvidson emphasized that the successful activities should not be unraveled. The challenge is for a distributed system that keeps what works, meets existing obligations, and enables coordination of activities and seamless access to OSS-produced information and data. In response to a question, Dr. Arvidson described how PDS is organized. Funding goes to JPL, and the nodes (competitively selected) are operated under JPL contract. Most of the archiving and the work is done at the distributed sites. It can be thought of as a "virtual institute." One of the challenges of the SSDS will be management. With respect to interacting with the programs, many of those involved in PDS are also involved in the missions; also, there is a Planetary Data System Management Council which has worked very well.

Dr. Tim Killeen from the University of Michigan completed the discussions from the perspective of space physics (Sun-Earth Connections). There appears to be a five year transforming cycle in data systems, and the pace of change should be driving the development of the SSDS. In the early 1980's, everything was PI driven (the data belonged to the individual). By the mid-1980's, coordinated multifaceted data sets were seen. NASA invented SPAN, which transformed the way science was being done. In the 1990's, the Web was transforming everything, and is becoming a way of publishing in a fast-moving field. Another aspect was the move by the community to data theory closure. The science is no longer in disciplines, but is in connections and understanding processes: a unified system of mass, energy, and momentum. What is now important for the science are the connections. The SSDS must include the next transforming cycle in data systems. Some questions are: How does the SSDS fit into emerging technologies? How does it fit into the larger picture of science in the future? Increasingly, science will require modules (data and others) that can be readily combined. Collaboration technologies are improving rapidly and there is an insatiable demand. Any new system must track the "agents of change." The current SPDS is an excellent, well-constructed web presence, with multiple capabilities. It is comprehensive and well maintained. Permanent archive should be a NASA responsibility. The use of the SPDS capabilities seems to have been less than what one might expect. The "growing pains" have put some off permanently, and there is a one-dimensional nature to the products. The system requires significant investment to optimize personal involvement. Also, there have been some unrealistic expectations or appreciation for the magnitude of the task. Of importance to the science environment are: model/data source "brokers"; high performance visualizers; digital libraries; data base agents; electronic workshops and campaigns with replay capabilities; and Space Weather science testbeds. The SSDS must have collaboration tools, digital libraries, electronic workshop, and on-line help. The deployed user nodes should be competitively selected; have value-added expertise; collective oversight responsibility; overlapping specialization spanning the full field; and natural alliances with international and interdisciplinary partners. There should be regular evaluation reviews, and progress towards objectives must be tracked. The system should use the university analogy: strong deans with direct financial responsibility, and a provost with institution-wide oversight and responsibility. A suggestion was made that the SSDS provide full references/citations, easily available for users.

Data Management Task Group Report

Dr. Jeff Linsky from the University of Colorado reported on the findings and recommendations of the Data Management Task Group, which he chaired. The Group was formed in February 1996 at the request of the Associate Administrator for Space Sciences. The Final Report was submitted in October 1996, and can be found on the Web at http://adc.gsfc.nasa.gov/~gass/linsky/report_available.html. The main objective was to suggest improvement of NASA's data holdings in space science and Earth science (coordination, architecture, access, tools, and usage). The Group was also charted to address which of the data management functions could be outsourced. There were many other issues discussed by the Group: reduction of costs, increase in the role of interdisciplinary science, the usefulness of the NSSDC, the availability and proper documentation of some data sets, a coherent data system covering the space sciences, and a vision for the future. After the report was written and submitted, two other issues were noted: NASA's desire to single source the SSDS, and NASA's increasing desire to outsource. A number of data systems were examined, and "lessons learned" were identified. Everyone benefits from maximum use of data. The communities which have been most successful in deriving maximum value are the communities in which there is an active effort to promote a culture of free and open exchange. Community pressure is the most effective means to ensure that this happens. NASA needs to play a proactive and continuing role in the maintenance and dissemination of data. Technology must not be forced. NASA is a small player in the information system technology world, and should take advantage of what is there. The group confirmed the basic principles of successful science data management originally described by the CODMAC in their 1982 and 1986 reports: scientific involvement; scientific oversight; data availability; structured transportable software; permanent and retrievable data storage; and adequate funding. The Group added two additional principles: data ownership and curatorship, and publishing. NASA owns the data and has responsibility for it, and assigns data curatorship to a project and a selected data center.

Recommended building blocks for the SSDS include: selection (competitive for a fixed period of time); a coordination office (does not hold any data, but sets standards and advises projects); distributed data (to "data lovers" in the user communities); data nodes (distributed centers that curate, maintain, validate, and distributed space data); periodic peer review (for each component of the SSDS); an advisory structure (Science Information Systems and Operations MOWG); a permanent archive; existing data nodes (keep the ones that are working well); user communities; education; and NSSDC (disestablish after transition to SSDS is complete). The current environment is characterized by scarce resources, exploding data bases, changing research styles, rapidly changing technology, and distributed resources. In addition, decreasing manpower at NASA Headquarters and a desire to outsource indicate that NASA's role should change from Manager to Partner with the user communities in the management of the SSDS. A change in role from Manager to Sponsor would be dangerous in the long term.

Dr. Linsky addressed the question of whether or not the Coordination Office (CO) should be outsourced. Arguments in favor: competition brings out the best ideas; an outside group is better able to flight for proper funding; and universities are highly innovative environments. Arguments against: minimize the cost and disruption of changing the host and the location during the transition period; a NASA Center is better able to withstand "glitches" in funding; a very long term commitment is essential; NASA must remain an active player; and a NASA Center entity can enforce NASA's data requirements better than a non-NASA entity.

The Task Group recommended assigning the CO to GSFC. However, a minority of the group would like to see the CO competed. Any assignment to GSFC should be done with a clear charter and an active Management Council that represents the user communities. IF NASA decides to compete the CO, then NASA centers should be allowed to compete with other entities. The CO should be small, and the functions should be strictly limited to: project management (with the input of the Management Council); leadership in the setting of standards; leadership of system engineering for SSDS; pro-active coordination and planning with active missions and instrument teams, including advice on writing and implementing Project Data Management Plans; and participation in the periodic reviews of the data nodes and permanent archive. The CO will have no in house science data sets. The Management Council represents the user communities and acts as a "Board of Directors" and a source of expertise for the CO. The SSDS is funded by NASA, through GSFC (or other NASA Center) contracts.

There were a number of comments and questions associated with the Coordinating Office. A concern was expressed over how the CO could maintain credibility and expertise without some direct connection with the data. Dr. Linsky noted this concern. Also, if a small management office is formed, how can it ensure that the right elements get into the Project Data Management Plans (PDMPs)? It is envisioned that the Information Systems and Operations Working Group provide policy advice on PDMPs. There is a need for financial stability of the CO, but there are alternatives in between the two extreme positions described in the presentation. The existing discipline nodes have different cultures, e.g., planetary has a central coordinating node, but astrophysics does not. However, all nodes should be organized in a way that enables interdisciplinary science. Every discipline node should provide data for permanent archive. Permanent archive should be linked with the CO. NASA needs to think about how this role will be played by the working data sets residing in the nodes. The question of a "world data center" has not been addressed. It may be outmoded given the Web and a permanent archive. The ISOMOWG should ensure that NASA maintain the position of stakeholder. One of the functions of the CO would be to direct queries in the right direction on the Web site. The structure being implemented for OSS education and public outreach includes four forums (one for each science theme) as well as a set of broker/facilitators. This data system must interoperate with those locations.

SSDS Concept and Approach

Mr. Joe Bredekamp described the SSDS purpose and rationale. The purpose of the SSDS is to establish a coherent and coordinated space science data environment to improve the quality, accessibility and usability of NASA's space science data holdings for scientists, educators, and the general public. This is part of a continuing development of a Space Science Strategic Plan. Both technology programs and the research infrastructure need to be adjusted to support the Strategic Plan. The SSDS is a critical part of the research infrastructure, and improvements must be made to support the Strategic Plan. The SSDS is a coordinated federation building on a sound data tradition. It is not a large "system" development. It is interoperative and integrated to support sharing, commonality, and broader access. NASA has a stewardship responsibility for space science data as a national resource, and it will not be abdicated. The Task Group has provided a sound set of principles and recommendations to build on. No new funding for SSDS is expected; it will be funded within existing budgets for archiving and data management efforts (about $18-$20 million per year).

There are three principal elements in the architecture: permanent archive; distributed domain nodes; and a management node. The management node is vested in the community (envisioned as a "institute without walls" and an opportunity for partnership). The intent would be to select this node through a competitive process, and have it be the principal funding entity. The management node will conduct the competitive selection of domain nodes. The SSDS becomes an organic part of the infrastructure for science. It must have well-defined and coherent interaction with the OSS Education and Public Outreach structure, and must maintain strong international cooperation. In planning and implementing the space science data environment, OSS seeks participation across broad community segments: science users, data system engineers and technologists, academia and government, private sector and technology innovators, and educators and the general public. Next steps in the process are: further study of the architecture and design; refinement of the implementation plan; and development of the solicitation for the management node. There could be a follow-up workshop (or series of workshops) leading to the development of the solicitation.

Comments from the audience:

Plenary Discussion Session

Afternoon ad hoc speakers

Bill Mish (GSFC) - ISTP Concerns: The ISTP Program has some serious concerns associated with infrastructure issues. There are a number of Code 630 products that the Program is dependent upon, and it is not clear how these products will be supported under the SSDS. In the outsourcing scenario, there will be serious problems if there is discontinuity. Mr. Bredekamp noted that one of the design principles is continuity, and transition issues like this will be addressed.

Tom Garrard (Cal Tech) - Community Input Needs: The Space Physics Data System (SPDS) is a parallel organization to the PDS. There are substantial cultural differences between the space physics community and the astrophysics community. The SP issues are associated with data complexity and calibration. SPDS has been a volunteer/community driven organization with very limited funding. Emphasis has been on mechanisms for input from the community, and keeping the community informed. Recommendations are: stronger mechanisms for community feedback, including more town meetings and workshops, electronic and/or geographically distributed; publicity via AGU etc., and via WWW; domain coordination teams with large memberships and a clear (hierarchical?) path to the "top". These recommendations are addressed to OSS and the ISOMOWG, as well as to the coordination office when it is established.

Nick White (HEASARC-GSFC) - Lessons Learned and Future Prospects: Using catalogs and data at all levels requires access to a multitude of sites. NASA created ADS and ESA created ESIS to address this problem. Both projects failed to achieve their objectives, and have either been reduced in scope or canceled. Their mistakes should not be repeated. ADS put a burden on the data providers, and also conflicted with existing user interfaces from data providers. Ultimately, the system was overtaken by the advent of the WWW. Requirements for a distributed data system are: do not put a burden on the data providers; use public domain access methods (e.g., ftp, http, etc.); do not try to provide a do-it-all user interface (concentrate on defining a few critical standards and enable the community to build their own user interfaces). Rich sources of survey data are already network accessible. A seamless integration of services using URLs can be expected. A few simple standards should be agreed upon. In response to a question, Dr. White noted that probably user community involvement in ADS was insufficient, particularly during implementation.

There is an effort underway to review all of the active space physics data sets. The ones that are exceptionally well done will be recognized; those that are not will be improved with the participation of the community. How can users be motivated to interact in the development phase of the system? One good example of a system that works well is CDAW Web.

Roger Brissenden (SAO) - High Energy Data Provider: From the astrophysics community point of view, it is very important to build on the existing system throughout the community. The infrastructure should be "light-weight" and have low overhead. Look at systems that use the Web. Rapid evolution of systems at the point of expertise should be enabled.

Michael Kurtz (SAO) - Astrophysics Digital Library: A system already exists among text-based astronomy systems. The data providers are providing it based upon the final product. The journals maintain tables of data in machine readable form. The nomenclature is agreed upon, and objects have names. The Abstract Service covers planetary data as well.

Bob Hanisch (STScI) - Technical Opportunities and Scientific Rationale: Be very careful not to force from the top down. Look at what level of integration is valuable scientifically and achievable technically. Nomenclature may have very little overlap. The most important role is finding data that is relevant to the scientific problem. Once data is found, the important function is getting it in a form in which it can be used. This can be done by building upon the excellent facilities already in place, and integrating them. ADS drew upon proprietary systems and put burdens upon data providers. Related services will want to participate in SSDS. The technology is now in hand, and prototypes have been developed. In terms of public outreach, data services must support the educational outreach community, but it is a mistake to make the archive systems fully accessible as a public resource: the data can be too complicated and voluminous to be useful to them.

Kevin Gamiel (NCSA, U. Ill) - Call for Collaboration: There are already a lot of systems in which investments have been made. NCSA will offer a free software package that drops into existing network accessible data repositories, and a standard set of queries with full support of concepts for searching different types of astronomical resources.

David Book (U Md.) - Email Archive: Today, the intellectual archive (notes) should also be for the public and historians. E-mail could be flagged as "archive" and it would be sent to a repository. [The sense of the group was that this would not be very useful.]

Joe Mazzarella (IPAC) - Reality Checks: Scope and Resources: The SSDS should have a well thought out cost benefit analysis early on. There are limitations to use of a system for interdisciplinary research. There are common needs, such as an image browsing tool for the Web. How does the SSDS differ from using the Web with common sense? The cost of hardware and people should be factored into the $ 20 million budget. Problems have been encountered relative to large data transfer over the Web. Take advantage of what people in other fields know, and leverage technology developments.

Bob Fowler (NSF) - For a scientist to maintain a data archive that will be generally useful to a large segment of the community in a technologically changing environment will be a challenge.

The evolution of Web technology is exciting. There has to be some confidence that the data obtained is trustworthy.

Don Sawyer (GSFC) - Technology Comments: There are new technologies coming along which may significantly impact how data is used. Don't get over enamored with any one technology that may exclude others.

Dennis Gallagher (NASA/MSFC) - The scientists are still thinking in terms of holding the data. Data does not have to be held (e.g., a "data mine") in order to be accessible. What could be more easily competed is a "data miner" service, which is people based. The information technology who are managing the data for science must work in close collaboration with the people who are doing the science from the data.

Discussion Topics

Management Issues

Technical Issues

Transition Issues

Comments

Summary Perspectives on the Meeting

Jim Green (NSSDC): Community Discussion

Other comments

Recommendations

Another model Use a successful model (e.g., PDS)
Plan -> Pilot -> Plan -> Comp. -> plan -> comp.
Put together a pilot activity (an appointed coordination office) to define what is in the permanent archive, the services, etc., and to develop what the new nodes should look like. As a function of time, the new CO would be put in place, without interrupting services that have begun. Assess how this is working, and continue to iterate until the goal is reached.

Morris Aizenman (NSF): What is it that the customer wants? Where are we going with this new structure and why? What is it that we don't have that we need? What is the CO going to do? What is the value added?

Is there an added service that is better than what is provided today? If not, don't waste time or money. Funds are limited, and NASA should be sure that what it gains will be more valuable than what is loses.

Ray Tatum: It sounds like a paradigm is being laid down. Because of the diverse cultures, some distribution is required. The critical part is the transition phase. Do not move too fast; spend some time studying. Industry wants to be involved in the process. The things the science community does well, it should continue to do; the things that industry does well, it should do in partnership with the science community.

Ray Walker: Space Science Data Coordination

Other Comments

Barry Madore: The question is not how NASA should implement these solutions, but whether they should be implementing solutions. Ultimately, the most important thing is knowledge. An enforcement of a system is not going to lead to answers; scientists are self motivated without this structure.

Nick White: One of the biggest impacts of the Linsky Report is to dissolve the NSSDC. The NSSDC has been around for a long time, and it would be wise to let those who are running NSSDC develop a plan on how it should be dissolved.

Next Steps

Before adjourning the meeting, Mr. Bredekamp noted that the next level of planning will include: defining the structure and functions; and developing the implementation plan, as well as transition plan. These will be posted as developed to maintain this group in the process. Written comments for the record were invited.