Computational Modeling Systems: Computational Support for Scientific Modeling Activities
Terence R. Smith
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-2966
smithtr@cs.ucsb.edu
Divy Agrawal
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4385
agrawal@cs.ucsb.edu
Tom Dunne
Dept of Geological Sciences
University of Washington
Seattle, WA.
206-543-7195
dunne@bigdirt.geology.washington.edu
Omer Egecioglu
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3529
omer@cs.ucsb.edu
Amr El Abbadi
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4239
amr@cs.ucsb.edu
Oscar Ibarra
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4171
ibarra@cs.ucsb.edu
Jianwen Su
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3698
su@cs.ucsb.edu
Yuan-Fang Wang
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3866
yfwang@cs.ucsb.edu
Many areas of scientific and engineering research require comprehensive and
integrated computational support for the development, evaluation and
application of symbolic models of phenomena. Activities requiring support
range from the acquisition and manipulation of raw data to the construction
and evaluation of complex sets of mathematical equations and include:
1) creation and execution of models at a conceptual level
appropriate to scientists;
2) interoperability between external heterogeneous tools
and the ability to reuse tool specific code;
3) accessing of geographically-distributed data, using
appropriate data abstractions and heterogeneous data
access mechanisms;
4) handling of large data objects on a demand-driven basis
and the ability to filter the data prior to its access.
A computational modeling system (CMS) provides scientific investigators with
such support. It comprises a modeling environment, which contains a knowledge
base of symbolic representations of phenomena, and a resource access
environment in the form of a "seamlessly" integrated collection of
computational modules. The modeling environment of a CMS is based on a
characterization of scientific modeling activities that is focussed on the
manner in which scientific concepts are represented, manipulated,
and evaluated in the scientific modeling process. The representation
of a concept is formalized in terms of "Representational Structures"
or "R-structures". An R-structure is a triple {D, T, I} in which
1) D is the Representational Domain or R-domain;
2) T is a set of "transformations" that may be applied in D;
3) I is a finite set of "instances" of D.
R-structures provide a significant generalization of existing constructs in
semantic and object-oriented data models. The process of scientific
modeling may be viewed as one in which
1) extensible collections R-structures are constructed,
evaluated and applied in modeling both the phenomena
in specific application domains and the phenomena of
the modeling process itself;
2) instances of the domain elements of R-structures are created
and sequences of transformations are applied to the instances.
Mechanisms to support inheritance and distinct but equivalent representations
of the same concept have also been developed.
In collaboration with a team of EOS-scientists from University of Washington,
we have built a CMS, Amazonia, which supports modeling in large-scale earth
science research and, in particular, data-intensive and numerically-intensive
modeling applications. Amazonia has a model-oriented perspective, and is
currently employed to solve modeling problems relating to the flows of water,
sediment, and solutes in the Amazon drainage basin. In principle, however,
there do not appear to be restrictions on the domains of science to which such
a CMS may be tailored. For any application, sets of R-structures may be
created and manipulated using a simple high-level computational modeling
language, CML. Data abstractions needed by scientists are also handled
in Amazonia. The tool and data access systems provide transparent
interoperability between local and remote tools and services. It includes
support for scalability and extensibility, and represents a possible solution
to the integration of legacy systems.
We have implemented R-structures using the OODBMS O2. To make CMS independent
of O2, we have built an interface layer which specifies the functionality
Amazonia needs from the DBMS using ODMG mandated features only. The CMS
engine has been designed to handle dynamic definition of arbitrary class
structures and handle updates of objects. Transformations in Amazonia can be
written in CML or can be external code blocks like FORTRAN, C executables
or package tools like MATLAB, Mathematica, KHORUS. Techniques have been develop*
for the asynchronous handling of multiple tools with the concept of
pseudo-terminals.
A new open layered, peer-to-peer protocol, CMS-Resource Access Protocol
(CMS-RAP), based on the Hyper-Text Transfer Protocol(HTTP) has been
developed to transfer code and data to remote sites, and compile and
execute remote tools. We have extended the protocol to be able to
interface with tools on parallel machines as computation resources.
This way, for the sake of computational efficiency, programs can be executed
on parallel machines, and their outputs be used subsequently in CMS.
Ability exists to read data from remote sites on a demand-driven basis
rather than store a local copy of the data at each client site.
A simple GUI based on a visual representation of CML supports the easy
construction and manipulation of scientific modeling concepts in general
and of the concept of a ``model'' in particular and provides access to
the modeling environment of the CMS.