Visualization of Large-Scale Data

Visualization of Large-Scale Data: Do we have a solution yet?

Organizer

Kwan-Liu Ma, ICASE

Chair

John Van Rosendale, Department of Energy

Panelists

Stephen Eick, Lucent Technology
Bernd Hamann, University of California at Davis
Philip Heermann, Sandia National Laboratory
Christopher Johnson, University of Utah
Mike Krogh, Computational Engineering International Inc.

While we are seeing an unprecedented growth in the amount of data from both theoretical simulations and instrument/sensor control, our capability to manipulate, explore, and understand large datasets is growing only slowly. Scientific visualization which transforms raw data into vivid 2D or 3D images has been recognized as an effective way to understand large-scale datasets. However, most existing visualization methods do not scale well with the growing data size and other parts of a data analysis pipeline. To accelerate the development of new data manipulation and visualization methods for truly massive datasets, the National Science Foundation and the Deaprtment of Energy have sponsored a series of workshops on the relevant topics. As a result of these workshop, a new concept is emerging and known as "Data and Visualization Corridors" which represents the combination of innovations on data handling, representations, telepresence and visualization. In the next few years, we expect to see more man power and resources invested to solve the problem of visualizing of large-scale data, and at the same time more demanding applications.

In this panel, the findings and results of the workshop held in May at Salt Lake City, Utah will be reported. In addition, we will try to answer the following questions, with some help from the attendees:

How large is large?
Where do the large data sets come from?
Can current graphics and visualization technology cope with the volume and complexity of the data produced by tera-scale calculations or high-resolution data collection devices?
How much of the data do we need to see, and how do we find what we need to see?
What are ideal data representations which can enable more efficient visualization?
How much processing power, storage space, bandwidth, and dislay resolution do we need?
How much visualization computing should we do at runtime when the data are being created versus at postprocessing time.
Is computational steering a reality?
Are there common visualization solutions for scientific, engineering, medicine and business data?
What can the visualization software industry offer now and in the near future?

The panelists are from educational institutions, corporate research laboratory, national laboratory and industry. They represent a corss-section of ideas and experience. However, we still cannot cover all aspects of the problems and solutions. We hope, among the attendees of this panel, those who have studied other relevant topics will share with us their insights.

Stephen Eick
Lucent Technology

The amount of data collected and stored electronically is doubling every three years. With the widespread deployment of DBMS systems, penetration of networks, and adaptation of standard data interface protocols, the data access problems are being solved. The newly emerging problem is how to make sense of all this information. The essential problem is that the data volumes are overwhelming existing analysis tools. Our approach to solving this problem involves computer graphics. Exploiting newly available PC graphics capabilities, our visualization technology

provides better, more effective, data presentation
shows significantly more information on each screen
includes visual analysis capability

Visualization approaches, such as ours, have significant value for problems involving change, high dimensionality, scale. In this space the insights gained enable decisions to be made faster and more accurately.

Bernd Hamann
University of California at Davis

We are now reaching the limits of interactive visualization of large-scale data sets. This is to be interpreted in two ways: First, the shear amount of data to be analyzed is overwhelming, and researchers do not have the amount of time available required to "browse" and visually inspect an extremely high-resolution data set. Second, the rendering resolution capabilities of current rendering and projection devices are too "coarse" to visually capture important, small-scale features in which a researcher is interested in.

Two active areas of research can help in this context: multiresolution methods used to represent and visualize large data sets at multiple levels of resolution, and automatic methods for extracting features defined a priori and identifying regions characterized by "unusual" behavior. Multiresolution methods help in reducing teh amount of time it takes a researcher to "browse" the domain over which a physical phenomenon has been measured or simulated, while automatic feature extraction methods assist in steering the visualization process to those regions in space where a certain interesting or unusual behavior has been identified.

In summary, multiresolution and automatic feature extraction methods both serve the same purpose: The reduce the amount of time required to visually inspect a large data set. We shoudl investigate in more depth the synergy that exists between these two approaches. For example, one could envision a coupling of these two methodologies by applying feature extraction methods to the various levels in a pre- computed multiresolution data hierarchy, which would lead naturally to the extraction and representation of qualitatively relevant information at multiple scales.

Philip Heermann
Sandia National Laboratory

The push for 100 TeraFLOPS, by the U.S. Department of Energy's Accelerated Strategic Computing Initiative (ASCI) Program, has driven researchers to consider new paradigms for scientific visualization. ASCI's goal of physics calculations running on 100 TeraFLOPS computers by 2004 generates demands that severely challenge current and future visualization software and hardware. To address the challenge, researchers at Lawrence Livermore, Los Alamos and Sandia National Laboratories are investigating new techniques to explore the massive data.

The leap forward in compute technology has impacted all aspects of visualizing simulation results. The data sets produced by ASCI machines can greatly overwhelm common networks and storage systems. Data file formats, networks, processing software, and rendering software and hardware must be improved. A Systems Engineering approach is necessary to achieve improved performance. The common approach of improving a single component or algorithm can actually decrease performance of the overall system.

Using a full system design approach, visualization requirements are compared with technology tends to quantify the visualization system requirements. Researchers are exploring data reduction and selection, parallel data streaming, and run-time visualization techniques. The system performance goals require each technique to consider a balanced combination of hardware and software.

Christopher Johnson
University of Utah

Interaction with complex, multidimensional data is now recognized as a critical analysis component in many areas, including computational fluid dynamics, computational combustion, and computational mechanics. The new generation of massively parallel computers will have speeds measured in teraflops and will handle dataset sizes measured in terabytes to petabytes. Although these machines offer enormous potential for solving very large scale realistic modeling, simulation, and optimization problems, their effectiveness will hinge upon the ability of human experts to interact with their computations and extract useful information. Since humans interact most naturally in a 3D world and since much of the data in important computational problems has a fundamental 3D spatial component, I believe the greatest potential for this human/machine partnership will come through the use of 3D interactive technologies.

Within the Center for Scientific Computing and Imaging at the University of Utah, we have developed a problem solving environment for steering large-scale simulations with integrated interactive visualization called SCIRun. SCIRun is a scientific programming environment that allows the interactive construction, debugging and steering of large-scale scientific computations. SCIRun can be envisioned as a ``computational workbench,'' in which a scientist can design and modify simulations interactively via a dataflow programming model. SCIRun enables scientists to modify geometric models and interactively change numerical parameters and boundary conditions, as well as to modify the level of mesh adaptation needed for an accurate numerical solution. As opposed to the typical ``off-line'' simulation mode - in which the scientist manually sets input parameters, computes results, visualizes the results via a separate visualization package, then starts again at the beginning - SCIRun ``closes the loop'' and allows interactive steering of the design, computation, and visualization phases of a simulation.

Mike Krogh
Computational Engineering International Inc.

With the advent of terascale supercomputers, such as with the DOE's ASCI Program, that are magnitudes larger than what is typically available in the commercial marketplace, a reasonable question is "is commercial visualization software a viable option for large data visualization?" Is such software a viable option for the supercomputer users and their management? What features do they want? What do they have to be willing to forego? For the software provider, what hurdles must be dealt with? Where are the overlaps between mainstream and bleeding-edge requirements? I will address these issues, and others, in the context of Computational Engineering International's EnSight Gold visualization package.