Skip navigation

Home » Department » News » Keper/CORE Award

NSF Award for Kepler/CORE to Accelerate Scientific Workflow Development

The National Science Foundation, Office of Cyberinfrastructure has awarded $1.7M over three years to a team of researchers from UC Davis, UC Santa Barbara, and UC San Diego to develop Kepler/CORE, a Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure.

In recent years, scientific workflow research and development has gained enormous momentum, driven by the needs of many scientific communities to more effectively manage and analyze their ever increasing amounts of data.

Whether scientists are piecing together our ancestors' tale through Assembling the Tree of Life (AToL, pPod, CIPRes); deciphering the workings of our biological machinery by chasing and identifying transcription factors (ChIP2); studying the effect of invasive species on biodiversity (SEEK); observing and modeling the atmosphere and oceans to simulate and understand climate change effects on the environment (COMET, REAP); trying to understand and tame nuclear fusion through plasma edge simulations (CPES); or probing the nature and origins of the universe through observation of gravitational lensing or simulations of supernova explosions (Kepler-Astro), in all these and many other domains, science is increasingly data-driven, often requiring considerable computational resources to handle the challenging data analysis tasks. These and many other projects have employed the Kepler scientific workflow system to address their scientific workflow needs.

"Scientific workflows are the scientists' way to get more eScience done by effectively harnessing cyberinfrastructure such as data grids and compute clusters from their desktops," says Bertram Ludaescher, Associate Professor at the Dept. of Computer Science and the Genome Center at UC Davis, and principal investigator of Kepler/CORE.

Scientific workflows start where script-based data management solutions leave off. Like scripts, workflows can automate otherwise tedious and error prone data management and application integration tasks. However, unlike custom scripts, scientific workflows can be more easily shared, reused, and adapted to new domains. Other advantages over scripts include built-in support for tracking data lineage or "provenance" which allows scientists to better interpret their analysis results, re-run workflows with varying parameter settings and data bindings, or simply debug or confirm "strange" results.

"When we started Kepler a few years back as a grass-roots collaboration between an NSF and a DOE project, we did not fully anticipate the broad interest scientific workflows would create," says co-PI Matt Jones, from the National Center for Ecological Analysis and Synthesis at UC Santa Barbara, adding, "the different groups in the Kepler community are pushing various extensions to the base system functionality, so it is now a perfect time to move Kepler from a research prototype to a reliable and easily extensible system."

Timothy McPhillips, co-PI at the UC Davis Genome Center, and chief software architect for Kepler/CORE adds, "To serve the target user communities, the system must be independently extensible by groups not directly collaborating with the team that develops and maintains the Kepler/CORE system. Facilitating extension in turn requires that the Kepler architecture be open and that the mechanisms and interfaces provided for developing extensions be well designed and clearly articulated."

Kepler/CORE development is informed and driven by various stakeholders, those projects and individuals who employ Kepler and wish to extend or otherwise improve the system for their specific needs. The inclusion of stakeholders in the steering of the overall collaboration aims at a more comprehensive and sustainable approach for future Kepler extensions.

"For Kepler to be seen as a viable starting point for developing workflow-oriented applications, and as middleware for developing user-oriented scientific applications, Kepler must be reliable both as a development platform and as a run-time environment for the user." says Ilkay Altintas, Kepler/CORE co-PI at the San Diego Supercomputer Center.

While Kepler/CORE is primarily a software engineering project, many interesting computer science research problems are emerging from the application of scientific workflows: "As a computer scientist it is fascinating to see how real-world scientific workflow problems--workflow design, analysis, and optimization for example--lend themselves to exciting research problems in computer science, spanning the areas of databases, distributed and parallel computing, and programming languages", says Ludaescher.

Shawn Bowers, co-PI and project scientist at the UC Davis Genome Center adds that "there is now a whole new research area at the intersection of scientific workflows on one hand and metadata and provenance management on the other. Science has always cared about reproducibility of results -- by 'getting provenance right' in scientific workflows we can dramatically improve the usability of workflow analyses and results."

For more information see Kepler-Project and Kepler/CORE, respectively.