System Designs for Visualizing Large-Scale Scientific Data
Organizer:
Kwan-Liu Ma, ICASE
Presenters:
Michael B. Cox, MRJ at NASA Ames Research Center
Christopher R. Johnson, University of Utah
Kwan-Liu Ma, ICASE
William J. Schroeder, Kitware Visualization Solutions
Leading-edge scientific and engineering computations and experiments
can generate data of unprecedented size and complexity, which present
new challenges to scientists/engineers who need to analyze and
visualize the data. There have been various efforts from
academia, industry, national laboratories, and the government to
meet the pressing need for new methods of handling massive datasets.
A consensus is the need of an integrated system approach and solution.
In this course, we highlight research efforts focusing on visualization
software system designs addressing the large data problems, and the
following four topics will be covered:
- Large data management for interactive visualization design
- Adapting data-flow systems to large datasets
- Parallel visualization systems
- Visual computing and interactive steering
The first lecture lays the groundwork for the rest of the course.
Michael Cox gives an overview of the problems of extremely large data
sets in scientific visualization, and a review of current solutions
and research directions with an emphasis on data management for
interactive visualization design. Michael begins with the distinction
between "big data collections" and "big data objects". Big data
collections are extremely large collections of scientific data.
Any single data set in such a collection may be small, perhaps 100
megabytes, but in aggregate the collection may comprise terabytes
or petabytes. Big data objects are just that -- extremely large
individual data objects such as vector or scalar fields output
from computational physics simulations. His lecture is concerned
with approaches for visualization of "big data objects."
Michael continues with a discussion of a number of characteristics
of the application that determine which approaches may be productive
and when, and which potentially not. These characteristics include:
- Query vs. browse.
- Direct rendering vs. algorithmic data traversal.
- Static vs. dynamic data.
- Data dimensionality and organization.
He then explores current techniques for management of extremely
large data sets. These techniques are explored in the context of
the application characteristics above, and include
- Memory hierarchy and system solutions.
- Indexing.
- Compression.
- Multiresolution.
- Data mining and feature extraction.
For the above approaches, examples and previous successes are discussed,
as well as some of their limitations. Where techniques are unavailable,
or not yet proven, opportunities and promising research directions
are offered.
Conventional data-flow based systems are widely used in the visualization
community. Their success has been partly due to the natural application of
the data-flow approach to the visualization process, which involves several
transformation steps to map data into sensory representations.
Unfortunately, the typical implementation of these systems requires passing
entire datasets through the pipeline. This approach fails when the data
size becomes large, since physical and virtual memory is exhausted.
In the second lecture, William Schroeder introduces an alternative
implementation of a data-flow visualization system. Instead of processing
entire data sets, a streaming approach that processes pieces of a dataset
is introduced. William describes the issues involved in implementing such
an approach including handling boundaries, extensions for multithreading,
mapping input to output, the effect of memory limitations, and generating
processing-order invariant results. He also describes a successful
streaming implementation in the freely available Visualization Toolkit (vtk)
system, present results and perform a live demonstration.
Increasingly, scientific computations with demanding memory and
processing requirements are being performed on massively parallel
supercomputers such as T3E, SP2 and Origin 2000. An example is
DOE's ASCI program plan. To support applications which use these
MPP supercomputers, visualization tools appropriate to the parallel
architecture are being developed to make possible high fidelity
visualization of the application data sets. Kwan-Liu Ma
illustrates the design issues for parallel visualization systems
used either for postprocessing of the data or for runtime monitoring
of the simulation.
Existing parallel rendering algorithms for distributed memory
architectures scale well only to a few hundred processors. Beyond
that, communication overheads tend to inhibit further performance
gains. It is crucial for visualization calculations not to compete
parallel computing resources with the simulation calculations.
Kwan-Liu explores new algorithms and new ways of structuring a
parallel visualization system to achieve maximum performance
with a limited number of processors and storage space.
By providing immediate visual feedback from large-scale simulations,
the engineering or designer can more easily determine if the computation
is headed toward a useful result. If not, the user has the option of
terminating the computation, or adjusting simulation parameters
on-the-fly. By incorporating visualization directly into applications,
the need for time-consuming post-processing steps is reduced, and
unexpected behavior or anomalies in the data can be spotted more quickly
and more easily. The net result is fast design cycles and higher-quality
solutions.
While Kwan-Liu's lecture covers runtime visualization, the focus is
how the visualization requirements and limited resources may impact
the design of parallel rendering algorithms. In the last lecture,
Chris Johnson presents a proof-of-concept of an originally ambitious attempt
-- computational steering. His lecture is centered around a problem
solving environment called, SCIRun. SCIRun is a scientific programming
environment that allows the interactive construction, debugging, steering
and visualization of large-scale scientific computations. SCIRun can
be envisioned as a ``computational workbench,'' in which a scientist can
design and modify simulations interactively via a dataflow programming
model. SCIRun enables scientists to modify geometric models and
interactively change numerical parameters and boundary conditions,
as well as to modify the level of mesh adaptation needed for an
accurate numerical solution. As opposed to the
typical ``off-line'' simulation mode - in which the scientist manually
sets input parameters, computes results, visualizes the results via a
separate visualization package, then starts again at the beginning -
SCIRun ``closes the loop'' and allows interactive steering of the
design, computation, and visualization phases of a simulation.
We propose a half-day course (3.5 hours) on the above topics which
we believe are most relevant based on recent technology advances
and current demands from science and engineering applications.
Each presenter will be given 45-55 minutes to cover enough background
materials for the audience to explore each topic further through
reading the collections of papers included in the course notes.
A more concise syllabus with estimated timeline is given as follows:
-
- 08:30 - 08:35 Opening Remarks, Ma
- 08:35 - 09:30 Large data management for interactive visualization design, Cox
- 09:30 - 10:15 Adapting data-flow systems to large datasets, Schroeder
- 10:15 - 10:30 break
- 10:30 - 11:15 Parallel visualization systems, Ma
- 10:15 - 12:00 Visual computing and interactive steering, Johnson
- 12:00 - 12:10 Open Discussions
|