ISO Archiving Standards - Third US Workshop- Minutes
National Archives - Center for Electronic Records
College Park, MD 20740-6001 USA
March 19-20, 1996
(NOTE: We invite all participants to critique these minutes and
to offer updates on any significant points they feel are missing or
inadequately reflected.)
The Third US Workshop on Data Archive Standards was held at
National Archives and Records Administration's (NARA) Archives II Complex
in College Park, MD, on March 19-20, 1996. The major items addressed for
the meeting are shown below.
Action Items
Materials Distributed
Future Meetings
Discussion Items
Action Items
This subject was not discussed as initially planned. Therefore, the list
of the December meeting actions is included for record purposes together
with the new actions from this March meeting. (The December list has a
STATUS added to each item.)
From December Meeting
From March Meeting:
- Notify Don if you are planning to attend May meeting (Any/All)
STATUS: Completed
- Provide schedule for July 6 meeting (Linda)
STATUS: Open
- Develop functionality list for new Production entity (Lou and Randy)
STATUS: Completed
MATERIALS DISTRIBUTED/REFERENCED
(Item/Author/Distributed By)
- Draft Agenda / Garrett / Sawyer
- Participants List / Garrett / Garrett
- Reference Model for Archive Information Services,
Version 3 / Reich, Sawyer / Sawyer
- Reference Model with Comments/Additions / Grunberger /
Grunberger
- Z39.50 Profile for Access to Digital Collections / ZIG /
Garrett
- Preliminary Classification of Metadata / Huc / Garrett
- Comments and Responses on Huc Paper / Sawyer, Huc / Sawyer
- Data Migration Scenarios / Voels / Voels
- Reference Model V3 - Section 3 Presentation / Sawyer / Sawyer
- Figures 5-1 and 5-2 of Reference Model V3 / Reich / Reich
- Archive Scenario for the Planetary Data System / Martin /
Martin
- Management Functions / Davis / Davis
- Storage Functions / Martin / Martin
- Ingest and Data Management Functions / Sawyer / Sawyer
- Access and Dissemination Functions / Grunberger / Grunberger
Future Meetings
- April 29-30, 1996
- Second International Workshop
- at JPL, USA
- July 10-11/12, 1996
- Fourth US Workshop
- at TBD
- September 11-12/13, 1996
- Fifth US Workshop
- at TBD
- November 4-5, 1996
- Third International Workshop
- at DLR, Germany
Discussions
Meeting was begun at 9:15. Bruce Ambacher welcomed the participants
and discussed logistics.
Don Sawyer then distributed a meeting agenda [Item 1], available at
http://ssdoo.gsfc.nasa.gov/nost/isoas/us03/agenda.html,
which was accepted and followed during the meeting.
The list of attendees [Item 2], available at:
http://ssdoo.gsfc.nasa.gov/nost/isoas/us03/participants.html
was circulated so entries could be verified.
Review of Reference Model - Section 3
Don Sawyer gave a presentation [Item 9] on Section 3 of the Reference Model
Version 3 [Item 3]. Included in the presentation was information on what
was included as well as perspectives, proposals, and issues.
Besides the items in the presentation,
several other questions about the model were raised during the presentation.
These included:
- What is the meaning of non-digital in the scope? Don replied the intention
was to include such things as films, papers, photos, etc.
- What is the purpose of excluding non-reproducible objects? Don replied that
the purpose was to limit scope, but it is possible that scope could
include non-reproducible objects if they fit with overall model.
- What is important? Preserving the thing vs. preserving the info? Don
replied that some entity has to decide what is important to save, i.e.,
has to decide what level of representation needs to be maintained.
In other words, the archive needs to identify just what information,
in a given submission, it needs to preserve. (What is the significant
information?)
- Don showed an "Information Representation Graph."
- Where do you draw the line on how much information about representations
to save? It's impractical to have to describe every representation used
for each data product. You would have to archive the whole world. In
a practical sense, you go as far as needed - to a level of current common
understanding. However, you need to be aware that in the future you will
need to add additional information as old representations move out of the
common understanding. You then need to add information on that
representation.
- Need to maintain the distinction between "knowing how" and physically
having the equipment to access information.
- Where does the meaning come from? It's not limited to a single level
but is built up from all the descriptions/representations.
- In current environment diagram, other archives were shown but not
connected because they may not have any connections other than as consumers
or producers.
- If there are no special archive-to-archive functions, it would
be better to leave other archives off
diagram than to show them with no connections. Words could indicate other
archives would appear in the consumer and producer roles.
- Are there any other roles for other archives? Are bulk transfers
simply the same as producers? Would special cases of less ingest quality
checking change that? All ingests should be checked regardless of
whether they come from another archive. Is a mirroring fuction a special
role? Isn't that just a special dissemination method, possibly even
outside the archive?
- Is model going to address any meta information about archives? Any
way to help people know what archives are available and what they
maintain?
- Archives are likely to have metadata relating to information in
other archives.
- There should be some policy in archives requiring consumers to provide
feedback about the data products they receive.
- Producers are simply identified by their interaction with the ingest
function and consumers by their interaction with access/dissemination.
Where does the service "provide information submission guidelines"
appear - from ingest or from access or both?
- A suggestion was made that another view of Higher Management could be
a fence around the entire archive, but it seemed better to leave it as
a box interacting with the archive. Although Higher Management controls
the entire archive box, the consumers and producers directly interact
with the archive which carries out Higher Managements mandates, but they
normally do not directly interact with Higher Management . If they do
interact, it is outside the scope of this model.
- Are the interactions being modeled as persons interacting or systems?
It really shouldn't matter. We are defining the functions and not the
form of the interaction.
Review of Reference Model Progess
Lou Reich gave a presentation [Item TBS] providing an overview of
Reference Model
progress. Items of the presentation included High Level Data Model, Z39.50
Profile for Access to Digital Collections [Item 5], and Reference Model OMT
Figures [Item 10].
Issues raised during this presentation included the following:
- At what level do we stop? How much detail do we put in model?
- Is Storage area modeled by IEEE Storage Model? Yes, at a base level but
other functionality may be needed on top of it.
Review of Data Migration Scenarios
Steve Voels provided an overview of the Data Migration Scenarios
[Item 8] that he
generated in response to action item at the previous meeting. Process
could be modeled two ways with
- an archive environment view - disseminate data to consumer who changes
it and then acts as a producer to re-ingest the new data.
- a view of migration being handled within archive.
Issues raised during this presentation included:
- Is migration an ingest or a archive management function?
- Policy decision may determine how migration is done - internally or
externally to archive.
- Migration function is not totally within Storage area. Would include
at least some Administration area.
- Probably don't want a change function in Storage. Should be an add
function followed by a delete of the old only after you know the
add was successful.
- Common conversion modules should be created which can then be used
by ingest or dissemination or storage.
- Is bit-to-bit migration function totally within storage?
- Eventually interfaces will scope what functions are in what areas.
- Steve indicated that NDADS currently maintains 3 tables - 2 of which
would be viewed as metadata and 1 viewed as storage.
- Format changes usually require metadata changes.
- Three different media requirements 1) Producer provides product on
some medium, 2)Archive will transfer product to its storage
medium, and 3) Consumer will request the product on some medium.
- Archives may maintain information in several formats. If multiple
copies are maintained, which one is visible to outside world?
Why? First there should be one authoritative source regardless of
the number of copies in archive. Outside world sees granules only,
not where they are coming from. Outside world sees set of products
which may have format differences, or may have access time
differences which are related to the different copies archives have.
- How are CDROM sources handled? Are they ingested and put on a shelf
or are they copied to other media? Local decision.
- Scope of Reference Model still needs to be clarified further.
- Need to put emphasis on data at ingest. Need to ensure input is
acceptable, at least to some statistical significance.
Suggested Reference Model Updates
The afternoon session started with an overview of the comments and updates
to the Reference Model by Paul Grunberger. Comments and updates were made
based on 1) suggestions at last meeting, 2) to correspond to Huc's paper
which contained a single access module and 3) John Rainey's presentation
at the first US Workshop (John stated his view then was that Dissemination
is a subset of Access function). Paul also tried to route consumers
session through a single access mechanism, but results in some problems
where there are 2 paths for the same information (through Access and
Dissemination) based on type of media requested (Online through Accesss,
CDROM through Dissemination).
Comments/Issues during this session:
- Do Security, Costing considerations/functions distinguish why both
Access and Dissemination are needed?
- Does metadata management box include customer info., cost info., etc?
If all types of metadata are stored together, separate views into
metadata are needed for different types of users. But most services
remain the same.
- Whatever the decision of Access/Dissemination combination, all functions
need to be allocated either in the single box or between both. Need
to define these functions and see how they fall out, to see if boxes
really end up looking the same.
If two boxes are used, a flow between Access and Dissemination is
needed so a single Access session can pass information through to
dissemination. Access needs to look like a consumer to dissemination.
- Processing of data for dissemination should be allowed. If allowed,
processing is an optional service but not part of baseline system.
- Optional vs. Baseline functions may be what distinguishes classes of
archives.
- Software can be treated as data related to other data and stored in
archive.
- Is anything precluded if we define processing as part of dissemination?
Seems like we include the whole world.
Review of PDS Archive Scenario
Mike Martin then presented on his Archive Scenario for PDS [Item 11].
Issues identified were:
- Problems getting producers to provide information in correct format
and with all required info. This was problem everyone identified
with. PDS approach is to have data engineers that go out and work
with projects. Expensive approach but estimates that doing the
job up front is 5 to 10 times easier than retrofitting the
information later. Also reluctance of projects to work with
archive at all, but if archive provides "free" resource, the
data engineer, to project, project is more willing to work
with archive.
- PDS has consistent data dictionary. Although resistant, projects
typically will adopt 90-95% of their terms from common data
dictionary. Dictionary maintenance also expensive (about 2 FTEs)
for PDS, but payoff down line from consistent dictionary.
- Due to common input formats, dictionary, etc., effective validation
programs have been developed and input is consistently subjected
to the validation.
- Peer reviews are used extensively. Feel they are effective since
90% result in liens for corrected or additional information to be included
before acceptance. Results in better products.
- Mechanisms in PDS for identifying/searching for what you want are
relatively primitive. Skyview and Clementine request are good
models to move toward. New ways needed for the larger community.
Current community knows how to get what it needs.
Mike Martin then provided an overview of his input for Section 7.1 of
the Reference Model. He indicated that it wasn't really clear whether
flows on diagram were functions or objects. Consequently both appear
in places on his diagram.
- What is searchable metadata vs. non-searchable?
- Various sizes of images generally provided within PDS.
Thumbnails - 100xwhatever pixels, about 8K file for GIF
Browse - 640x480, full screen size
"Toenail" - 320x240
These are the standard set of ordering aids/metadata.
- More metadata extraction will be required. Could range from simple
(extracting Date and Time from filename) to complex automatic
AI types of extractions of features from images.
- PDS tends to be hands off for format conversions. Relies on
maintaining additional/more extensive documentation to handle
older formats.
- Two types of data checking needed: bit level (CRC) and meaning level.
- Should metadata come from producer or be generated? Should be both.
Cannot generally require producer to produce metadata, but the
more the producer can provide the better.
- Metadata checks should be provided at different levels. System level
(default) checks can be made, but there should also be data set level
checks that handle specific restrictions for those datasets.
- Any validation of data should be viewed as an optional service.
Baseline should be simply, we can provide you what we got. We
may not have knowledge or resources to do any more in some cases.
- Metadata indicating level that archive-verified information should be
maintained. Different levels should allow distinction between
"haven't been able to check yet" and "don't know how to check."
Day 1 Wrap Up
Still need to determine set of boxes we will use for Reference Model. Also
need list of functions overall and in each of boxes.
Final Reference Model will contain a small number of scenarios, but more are
needed to prove the reference Model to ourselves.
Policy is located in the Archival Management box which we will, for next
version, call "Administration."
A determination of what functions are included in the overall Reference
Model and in each specific box is needed. Overnight all participants
should consider this issue. The following individuals will prepare their
ideas on specific areas to lead the discussions tomorrow:
- Mike Martin - Storage and Ingest Boxes
- Don Sawyer - Metadata Management Box
- Paul Grunberger - Access and Dissemination Boxes
- Randy Davis - Administration Box
The several individuals who had been assigned to identify functions in
each of the several entities made presentations. These are listed below
with comments offered shown in parentheses:
Management Functions
Randy Davis presented the following functions:
- Set charges for data
- Set data media migration/destruction policy
- Establish rules for data production/dissemination
- Plan for new capacity, new technology
- Manage Hardware, Software, Network Configuration
- Acquire, insert, validate new equipment and media
- Establish standards for data
- Design internal schema
- Tune Hardware, Software, Network Performance
- Schedule Maintenance
- Monitor media performance/wear (oversight)
(Would this be part of storage?)
(Storage would decide what charges and services would be
although policy would still be a management function)
- Audit security
- Billing and collection
- Coordinate entry of new data (submission sessions, peer review)
- Authorize new users
- Set up recurring requests for data
- Collection and analyze statistics on usage, performance, availability
- Prioritize/schedule data production and delivery across all
submission sessions and request sessions
- Monitor Hardware, Software and Network Performance
- Estimate costs of data deliveries and bill
- Authenticate users and authorize request sessions
He had not analyzed his list to identify:
- linkages to other entities
- which exposed external interfaces are candidates for
discussion and standardization
The following comments were made:
- Lou would add some functions like "estimation process" which may belong
to other entities
- All administrative data would be included under management as agreed on
Tuesday
- Day-to-day cost estimating not seen as a management function
- At Elise's suggestion, "Policy for user support" was added
- Something needs to be included re "orchestrating disaster
recovery"
- Need a "Back-up Policy" relative to what data is to be backed-up
- What about "Acquisition Planning?" Should we include handling requests
for data that is elsewhere?
- What about special requests?
- The question was asked as to just what was the definition of
management: overall or day to day. It is day to day.
Storage Functions
Mike listed the following functions:
- Store data objects and volumes
- Retrieve objects, subsets, aggregates
- Maintain audit trail
- Validate data objects (should this be under Ingest?)
- Suggested was "demands for data integrity" (Submitters must perform
in accordance with archive requirements)
- Back-up
- Migrate data objects
- Format data objects (should this be under Ingest or Dissemination)
- Extract metadata (should this be under Ingest)
- Handle duplicate stores
- Cache maintenance (a catch-all, for different levels of data and migrations)
- Messaging for migration
The following comments were made:
- Report available storage
- Look at NARA publication 34CFR
- Inventory media - climate control
- It was noted that Storage may have to do certain checks:
- is storage correct
- not degrading to the media
- If we look at these as automated components, what do Hierarchical
Databases versus DBMS do for you?
- Our efforts are not intended to reflect local storage activity,
but more towards large data archives
- Address migration inside and outside; this is bit-to-bit
migration across media as opposed to data format transfer
Ingest Functions
Don Sawyer suggested the following:
This entity provides the services and interfaces to acquire and prepare
information objects for the archive. The accession process involves both
describing and cataloging the information objects and securing them for
storage and access. This may include staging of information in preparation
for full acceptance, confirmation of receipt, and validation or creation of
required metadata. Ingest functions include:
- Support producers by providing mechanisms to identify a Submission Session
- Accept Data Deliveries within a Submission Session
- Extract Information Objects from a Data Delivery
- Check Data Deliveries and Information Objects for conformance to minimum
archive ingest requirements and initiate results-recording transactions
- Acknowledge receipt and verification of Data Delivery to providers, and
provide sufficient detail to allow provider to be reasonably certain that
the Data Delivery was received as provided. Initiate recording
transactions of Acknowledgement status.
- Determine profile of information representations used in each Information
Object, including the relationships among these representations. Initiate
recording transactions of these profiles.
- Determine mapping of Information Objects to Storage Elementary Objects.
- Determine any special requirements for access and dissemination of
information carried by the Information Objects, such as the need for
Virtual Storage Elementary Objects.
- Extract, or otherwise obtain, the metadata needed to populate the archives
internal data model implementation in support of generic and special
archive management, access, and dissemination needs.
- Perform any needed representation transformations on the information
objects to obtain Storage Elementary Objects.
- Initiate recording transactions to populate Data Management information
storage and Storage information storage.
- Initiate testing of Consumer access and dissemination services to ensure
the information from the Information Objects is archived and is
retrievable.
Some responses were as follows:
- Lou felt we should add some "messaging back" in the event that
data ingest is inadequate for acceptance
- 'Determine mapping' may be an example of remapping; expand to say
"Reformatting as necessary"; need ingest plan
- Lou has problems with terminology: There are some elementary
objects which are given to storage and are wanted back from
storage
- As to granularity, this is seen as the lowest thing (granule) you
will accept and store
- Determine any special requirements
- Have to get terms that say what we mean. For example: A
particular elementary object might go in for storage, but
one wants to receive only a portion of that. We need some kind of
processing algorithm to do this. Lou felt this is part of the
Schema
- Is this an ingest function?
- Lou felt that Ingest is to provide the metadata to make objects
visible
- To Lou, "Populate metadata" is the same as ingest
- Data Management creates the plan, e.g., a table design, with
management (Admin) oversight and Ingest populates them
- Mike felt all preliminary functions are administrative
- An "Access View" has to be present at Ingest or one will never
get the data out.
- Create ingest plan with an access view part of administration
- Add "ensure that incoming data" is similar, so that we create
another data set: the old view and the new view
- For *Extract, or otherwise obtain*, take off everything after
*data model.*
- Perform any needed representation transformations; does this
become part of the acquisition plan?
- Ingest does the transformation but the planning goes elsewhere.
May be part of administration as part of Ingest Planning. Better
to have producers have a similar view to facilitate
transformation and not pass the format problem to the producers
- EOS and EOSDIS do tell the producers how to produce the data.
- One of the reasons for the reference model is to push standards
into the user community as to what really should be done
- Initiate testing; does Ingest do this or does it audit the test
- It might be reasonable to acknowledge receipt and verification
- Consumers must clearly communicate with Data Administration
- In environment view the only path was through Ingest; need to
add path from both users to Data Administration
- May need to develop a product list; Ingest needs to know Data
Administration schedule of when and what will come from Data
Management. Will put in Data Administration
- Implement the security process. Get data from producers and
ensure a link from data to security policy and requirements and
access control
- Setting up an Ingest session involves Data Management
- Need flow chart to show what goes on during ingest session
- A Data Access session is everything from searching for information to
actually acquiring it. Divide into two parts: The first half is what
I want and the second is what goes out
- Management relates to the Ingest session and Administration is
the planning aspect
- Lou still questions what the submission session is; is it an
ongoing set of deliveries or a single delivery communication
session or media delivery set?
- One gets data from different providers - all as part of a
session. Lou feels each Ingest handles the instance as a
separate set and makes them visible
- Sessions have different meanings
- Don would like to see archives give the producer a number and
submissions are part of that session number
- Bruce asked if the accession is the ingest process: getting
something from an agency, making duplicates and returning the
original to the agency? Yes, in essence.
- Make the Ingest as that which makes a duplicate for
disaster back-up and also provides transfer of legal title
together with any access restrictions.
- It can get sole rights for distribution. Generally this is a not
problem because non-archives are not usually set up to do the
distribution. This function is a *loss of Full Time Employees (FTEs)
to them*
- When you find errors, do you have to re-Ingest? Yes, and this becomes
part of the Ingest process.
- If you find errors, this information is made available to producers or a
record is made if the user no longer exists
- Problems have to be provided to the requesters
Data Management Functions
(formerly Metadata Management)
Don Sawyer suggested the following:
Comments on Data Management were as follows:
- Multiple pieces of representations; what are the dependencies?
Need examples. Information might be in FITS format. FITS may be
broken down into syntactic and semantic representations
- Don feels that stored objects can be both physical and digital
- Multiple syntactic representations can support same semantics
- File naming conventions may depend on type of media
- Needs more concrete examples
- Make examples of dependencies among the representation used in each
Storage Elementary Object (SEO)
- Example: If I wanted to get all the pictures from Voyager that contain
Neptune - who knows this? Ingest could set up this relationship.
- John is proposing to extend the schema of Data Management to provide these
relationships with Ingest and Data Management adding pointers to
the data.
- Ingest has to add description records for new products, add to schema
- When adding new collections, one creates a brand new product. Example
is Galileo observations with all its granules
- Secondarily, there are other pictures of Mars (another inventory). Don't
do twice but add links among data sets
- UPDATE function was not that obvious, but needs to be there. Does this
involve a role for the data model?
- One is adding to an existing collection, or making a new collection
- Data Management has this responsibility?
- Somewhere someone has to design the schema which Lou has put under Data
Administration although Data Management may actually do this
- Is it possible that we have to have a Data Engineering function? That
is in Data Management and this is where the Data Administrator would go
to create the structure
- Randy stated that Z39.50 should not be part of the Reference Model but we
should be able to map it into the Model.
- Need to have discussion about terminology since Huc's work is being
used in a different context.
- Z39.50 is only looking at one aspect, Data Access, and Don is not fully
comfortable with all its terms
- Paul Grunberger would like to see a list of agreed-to terms. We need at
least a list of generic terms
- Mike is concerned as things come into Administration. The system
architecture had grown over 15 years. To make some sense, we need to keep
these older concepts or will all of this break down at some future time?
- Directories need to be updated, new inventory needs to be added to the
system. He would like to see how we take data model and map these concerns
into it
- Randy feels we need to map these terms into a more general model
- We need to discuss unstructured objects for which we use
description records
- We need to do more on this topic. This will take months
- Mike has vision of the data system in which everything is packaged
nicely; data goes in, is stored, and goes out all with some consistency
- It was stated that there are rules that exist which can be put in place.
Mike wants to make sure if we design our system carefully that this will
- This is seen as next step, the first one is the Reference Model
- Reference model is not binding, it does not say how you have to do
things. It is just a map of what needs to be done
- There is no high level view of standards even within the major systems of
NASA.
- Add: in interacting with Administration, do schema updates and provide
performance reports
- Data schema development is now seen under Data Management
- Where should "Status of request" go?
- Define access control and security under Administration.
Acess Functions
Paul Grunberger suggested the following functions:
- Maintains lists of provided services, prices, products, packaging
options, etc.
- Maintains authorizations for access and dissemination
- Receive and record updates to 1) and 2) from Archive Management
- Advertise information on Access and Dissemination services
- Maintain service interface(s) with Customer (Service Desk, Server, Web
Page, Electronic Bulletin Board, etc.)
- Query Data Management to provide search and browse information to
Consumer
- Obtain archived Data from Dissemination to support Customer Requests
- Derive Metadata, such as Thumbnails and abstracts, to support Customer
Requests
- Provide copies of derived metadata to Data Management
- Obtain order status from Dissemination to support customer requests
- Provide Data Delivery within Request Session
- Maintain Access transaction records
- Maintain records of deliveries to Customers
- Collect Access and Delivery; Statistics; provide them to Archive
Management
- Poll Customers to obtain feedback on customer needs. Provide results
and recommendations to Archive Management
Comments made on this list included:
- Items 1 (Maintain lists) and 2 (Maintain authorizations) should be
"Request lists" from Data Management and provided on demand.
- Delete item 3
- Item 5 (Maintain service interface) changes from customer to
consumer
- Item 7 (Obtain archive data) When consumer is browsing,
information is obtained from Data Management. "Is this all we can
tell about a particular thing?" Does Access take initiative to
do searching elsewhere? Lou felt we should have a way to
transfer a request to a help center of Administration
- A Help desk or librarian was seen as needing to be incorporated
- There are questions as to whether pre-canned functions for subsetting
would be made available or would tools be made available?
- A standard case with a subset order is ALL. If you can wait on
line - Access through Dissemination to Storage and back to
Access
- What Lou is saying is a form of economics. We build an archive
that assumes certain customer capability. We can serve a quick
look request and he can push a button and get it all. If he
wants a particular run he must pay for this.
- Paul is talking about putting different sets of data through
some type of filter. This is clearly an optional service for an
archive.
- Access is the place to define what gets done
- Example: Randy sees this as a waitress who takes your order
and gives it to a food producer. She gets the food from the
producer and determines that the order is filled correctly. Another
situation is where you go buy a refrigerator, but it takes two brawny
people to deliver it. Here the dissemination function determines the
correctness of the order, not the order taker. A "later delivery"
is done by the Dissemination port
- Dissemination has to get data out of an archive, but after any
reformatting, it can do the shipping directly or give it to
Access
- Randy feels we should resonate with the most people and he feels
the three-fold approach of "production, access, and dissemination" is it
- He would replace the whole notion of metadata with filters which
take the requests and matches them with data in very different ways
- We have been assuming that the data comes with metadata which
gets stripped off and later re-associated
- Access provides you with different views of the data
- Does Access do the filtering? Setting up filters seems more like
a function of Dissemination or Production.
- If an hour is needed to serve a user requests, don't want to tie
up the access function.
- Some see it differently. Building filter is in Access and this is
sent somewhere and called upon by Dissemination.
- "How long is the response time" may affect the operation
- Paul agreed that we need to provide the bi-directional flow of
information if any reasonable response time is involved. This is
Access
- Concept of Access is it is doing routine things. We still have
to go to Dissemination and notify them of this work
- All this should be a separate bubble
- Some folks are seen as proposing another box; what should it be
called? It could be called Production and shown connected with
Access and Dissemination
- Re filters, Paul does not see these as the same as Ingest which
tries to decide how many ways a consumer may want to see the
data and then picks a generic set of what they feel is most useful.
- Don does not favor a production bubble that serves both Ingest and
Dissemination - feels these are different processing functions and
is also concerned about too many bubbles.
- Randy leans towards Production as a separate bubble
- What about user training? Where should it be? Administration
- Tracking users should be done by Access.
- Re initiating service, where is initial contact of user. One
could call Data Administration via Web Page. Or one could come
through access as a guest. But to register as a new user, he
has to come through Data Administration
- Step 8 (Derive metadata), Access may take some initiative to
help customer
Dissemination Functions
Paul suggested the following functions:
- Retrieve Requested Data from Storage
- Translate Information Objects to required representation for delivery
- Append the metadata needed to complete the Data Object for delivery
- Maintain a delivery interface with Access
- Maintain a delivery interface with Consumer (Shipping Service, Internet,
Modem, etc.)
- Provide Data Delivery to Access within a Request Session
- For physical delivery, record data on the Delivery Medium
- Collect Dissemination statistics and provide them to Archive Management
- Maintain records of standing requests (subscriptions and back-orders)
- Provide order status to Access on request
- Provide Data Delivery to Consumers outside of a Request Session (This
requires a re-definition of Request Session)
- Provide notice of all Data Deliveries to Access
Additional comments included:
- Having a separate Production bubble will identify a conflict between
subsetting and Ingest which is creating new product combinations.
Both may use same resources. This is seen as very implementation
dependent
- Lou and Randy are to itemize the functions of a Production bubble.
- There are two views, one with Ingest calling it and one which
doesn't
- Ingest interfaces with users. Dissemination manages with
production, storage etc.
- Don feels the production stuff is all contained within
Dissemination
- There will be six bubbles if Production is added
Wrap Up
- Lou and Don are to turn out a new version of the model within
several weeks. It is to be reviewed in a week
- We need to generate some more scenarios
- By April 14, NASA will have to send its submissions to the
internationals for the May International meeting
- The next National meeting will be in July
- If NASA can get comments back by April 7, they will try to get the
comments turned around and issue another version by April 14.
- Scenarios will not be part of this deadline, but they will be part of the
July meeting
- Scenarios to go against the next version of the Reference Model
- Section 5 of the Reference Model paper is undefined but that
section is to be condensed
- We must define what constitutes a deliverable object
- The new version is to include improved vocabulary and cleaned up
section 3
- Don is concerned about Section 4 and the verbiage relative to
the functional areas, This needs updating
- Section 5 will be written (without a lot of detail) within a
week. It will contain access hooks to the balance of the book
- Section 6 is the context diagrams
- Sections 7, 8 and 9 will become Annexes inasmuch as they contain
material that really does not fit into model
- We need to make a list of new issues
Preliminary Classification of Metadata Proposal
Before the meeting was officially adjourned, Don reviewed this paper [Item
6] together with the comments he had sent to Claude and Claude's responses
to Don's comments [Item 7]. Don mainly covered the paper's highlights and
the interaction of his and Claude's comments. Further comments included:
- Some clean-up of the vocabulary is necessary.
- There are two definitions for SEO: Submission Elementary Object and
Storage Elementary Object
- Lou felt a collection of collections will be hard to query. One may want
to distinguish between collections of SEOs and collections of collections
- Physical objects can get both physical and logical metadata
- There can be a logical view of something smaller than a data granule. We
must distinguish among access-granule, storage-granule and
dissemination/production granule
- When one does subset, do you modify the metadata to accurately reflect
the situation. Doing the new metadata could be extremely difficult. This
depends on how the metadata is organized. If it is separable in the same
manner as the data itself is, there should be no problem.
- Don concluded that he felt our discussions today are consistent with this
paper
The Workshop was adjourned at 1645 hours, Wednesday, March 20, 1996.
Wider Views
Overview of the Third US Workshop
Overview of US Effort
Overview of International Effort
URL: http://ssdoo.gsfc.nasa.gov/nost/isoas/us03/minutes.html
A service of
NOST at
NSSDC.
Access statistics for this web are available.
Comments and suggestion are always welcome.
Day 1 Editor: John Garrett
(John.Garrett@gsfc.nasa.gov)
+1.301.286.3575
Day 2 Editor: Robert Stephens
(stephens@us.net) +1.301.949-0965
Curator: John Garrett
(John.Garrett@gsfc.nasa.gov)
+1.301.286.3575
Responsible Official: Code 633.2 / Don Sawyer
(sawyer@ncf.gsfc.nasa.gov)
+1.301.286.2748
Last Revised: June 26, 1996, Don Sawyer (August 31, 2000, John Garrett)