Visualization of Large-Scale Data
Visualization of Large-Scale Data: Do we have a solution yet?
Organizer
Kwan-Liu Ma, ICASE
Chair
John Van Rosendale, Department of Energy
Panelists
Stephen Eick, Lucent Technology
Bernd Hamann, University of California at Davis
Philip Heermann, Sandia National Laboratory
Christopher Johnson, University of Utah
Mike Krogh, Computational Engineering International Inc.
While we are seeing an unprecedented growth in the amount of data
from both theoretical simulations and instrument/sensor control,
our capability to manipulate, explore, and understand large
datasets is growing only slowly. Scientific visualization which
transforms raw data into vivid 2D or 3D images has been recognized
as an effective way to understand large-scale datasets.
However, most existing visualization methods do not scale well
with the growing data size and other parts of a data analysis pipeline.
To accelerate the development of new data manipulation and
visualization methods for truly massive datasets, the National Science
Foundation and the Deaprtment of Energy have sponsored a series of workshops
on the relevant topics. As a result of these workshop, a new concept is
emerging and known as "Data and Visualization Corridors" which represents
the combination of innovations on data handling, representations,
telepresence and visualization. In the next few years, we expect to
see more man power and resources invested to solve the problem of
visualizing of large-scale data, and at the same time more demanding applications.
In this panel, the findings and results of the workshop held in May
at Salt Lake City, Utah will be reported.
In addition, we will try to answer the following questions,
with some help from the attendees:
- How large is large?
- Where do the large data sets come from?
- Can current graphics and visualization technology cope with
the volume and complexity of the data produced by tera-scale
calculations or high-resolution data collection devices?
- How much of the data do we need to see, and how do we find what
we need to see?
- What are ideal data representations which can enable more
efficient visualization?
- How much processing power, storage space, bandwidth, and
dislay resolution do we need?
- How much visualization computing should we do
at runtime when the data are being created versus at postprocessing time.
- Is computational steering a reality?
- Are there common visualization solutions for scientific, engineering,
medicine and business data?
- What can the visualization software industry offer now and in the near future?
The panelists are from educational institutions, corporate research
laboratory, national laboratory and industry. They represent a corss-section
of ideas and experience. However, we still cannot cover all aspects of the
problems and solutions. We hope, among the attendees of this panel, those
who have studied other relevant topics will
share with us their insights.
Stephen Eick
Lucent Technology
The amount of data collected and stored electronically
is doubling every three years. With the widespread
deployment of DBMS systems, penetration of networks,
and adaptation of standard data interface protocols,
the data access problems are being solved.
The newly emerging problem is how to make sense of all this
information. The essential problem is that
the data volumes are overwhelming existing analysis
tools.
Our approach to solving this problem involves
computer graphics. Exploiting newly available
PC graphics capabilities, our visualization
technology
- provides better, more effective, data presentation
- shows significantly more information on each screen
- includes visual analysis capability
Visualization approaches, such as ours,
have significant value for problems involving
change, high dimensionality, scale.
In this space the insights gained enable
decisions to be made faster and more accurately.
Bernd Hamann
University of California at Davis
We are now reaching the limits of interactive visualization
of large-scale data sets. This is to be interpreted in two
ways: First, the shear amount of data to be analyzed is
overwhelming, and researchers do not have the amount of time
available required to "browse" and visually inspect an
extremely high-resolution data set. Second, the rendering
resolution capabilities of current rendering and projection
devices are too "coarse" to visually capture important,
small-scale features in which a researcher is interested in.
Two active areas of research can help in this context:
multiresolution methods used to represent and visualize
large data sets at multiple levels of resolution, and
automatic methods for extracting features defined a priori
and identifying regions characterized by "unusual" behavior.
Multiresolution methods help in reducing teh amount of time
it takes a researcher to "browse" the domain over which a
physical phenomenon has been measured or simulated, while
automatic feature extraction methods assist in steering the
visualization process to those regions in space where a
certain interesting or unusual behavior has been identified.
In summary, multiresolution and automatic feature
extraction methods both serve the same purpose: The reduce
the amount of time required to visually inspect a large data
set. We shoudl investigate in more depth the synergy that
exists between these two approaches. For example, one could
envision a coupling of these two methodologies by applying
feature extraction methods to the various levels in a pre-
computed multiresolution data hierarchy, which would lead
naturally to the extraction and representation of
qualitatively relevant information at multiple scales.
Philip Heermann
Sandia National Laboratory
The push for 100 TeraFLOPS, by the U.S. Department of Energy's
Accelerated Strategic Computing Initiative (ASCI) Program, has driven
researchers to consider new paradigms for scientific visualization.
ASCI's goal of physics calculations running on 100 TeraFLOPS computers
by 2004 generates demands that severely challenge current and future
visualization software and hardware. To address the challenge,
researchers at Lawrence Livermore, Los Alamos and Sandia National
Laboratories are investigating new techniques to explore the massive
data.
The leap forward in compute technology has impacted all aspects of
visualizing simulation results. The data sets produced by ASCI machines
can greatly overwhelm common networks and storage systems. Data file
formats, networks, processing software, and rendering software and
hardware must be improved. A Systems Engineering approach is necessary
to achieve improved performance. The common approach of improving a
single component or algorithm can actually decrease performance of the
overall system.
Using a full system design approach, visualization requirements are
compared with technology tends to quantify the visualization system
requirements. Researchers are exploring data reduction and selection,
parallel data streaming, and run-time visualization techniques. The
system performance goals require each technique to consider a balanced
combination of hardware and software.
Christopher Johnson
University of Utah
Interaction with complex, multidimensional data is now recognized as a
critical analysis component in many areas, including computational fluid
dynamics, computational combustion, and computational mechanics.
The new generation of massively parallel computers will have speeds measured
in teraflops and will handle dataset sizes measured in terabytes to
petabytes. Although these machines offer enormous potential for solving
very large scale realistic modeling, simulation, and optimization problems,
their effectiveness will hinge upon the ability of human experts to interact
with their computations and extract useful information. Since humans
interact most naturally in a 3D world and since much of the data in
important computational problems has a fundamental 3D spatial component,
I believe the greatest potential for this human/machine partnership will
come through the use of 3D interactive technologies.
Within the Center for Scientific Computing and Imaging at the University of
Utah, we have developed a problem solving environment for steering
large-scale simulations with integrated interactive visualization called
SCIRun. SCIRun is a scientific programming environment that allows the
interactive construction, debugging and steering of large-scale scientific
computations. SCIRun can be envisioned as a ``computational workbench,''
in which a scientist can design and modify
simulations interactively via a dataflow programming model. SCIRun enables
scientists to modify geometric models and interactively change numerical
parameters and boundary conditions, as well as to modify the level of mesh
adaptation needed for an accurate numerical solution. As opposed to the
typical ``off-line'' simulation mode - in which the scientist manually sets
input parameters, computes results, visualizes the results via a separate
visualization package, then starts again at the beginning - SCIRun ``closes
the loop'' and allows interactive steering of the design, computation, and
visualization phases of a simulation.
Mike Krogh
Computational Engineering International Inc.
With the advent of terascale supercomputers, such as with the DOE's
ASCI Program, that are magnitudes larger than what is typically
available in the commercial marketplace, a reasonable question is
"is commercial visualization software a viable option for large data
visualization?" Is such software a viable option for the supercomputer
users and their management? What features do they want? What do they
have to be willing to forego? For the software provider, what hurdles
must be dealt with? Where are the overlaps between mainstream and
bleeding-edge requirements? I will address these issues, and others,
in the context of Computational Engineering International's EnSight Gold
visualization package.