Lecture: 3 hours
Discussion: 1 hour
Prerequisite: Course 165A
Grading: Letter; project (50%), presentation (30%), homework (20%)
Catalog Description:
Scientific data integration, metadata, knowledge representation, ontologies, scientific workflow design and management. Offered in alternate years.
Expanded Course Description:
Textbook:
A selection of technical papers addressing specific topics will be used. No textbook is required.
Projects:
There are two kinds of projects: implementation projects and research projects. In the former, the students will work with Java-based open source systems such as the Kepler workflow system (www.kepler-project.org) and design and implement example workflows, e.g., to create a bioinformatics workflow that connects several "bio web services". Thus, in implementation projects students work with existing software systems, but they typically will also implement project-specific extensions to that software.
For reasearch projects, students will read one or more research papers from a list of offered research topics (e.g., scientific data integration, ontologies and knowledge representation in scientific data management, scientific workflows). Students will then need to apply the results of the research papers to a specific problem (e.g., applying a certain query rewriting algorithm to a given integration scenario and set of queries). In general, the deliverable of a research project is a technical report that summarizes and compares the results of the studied papers, and their application to the given problem. Depending on the topic, the presented algorithms might have to be implemented and applied to the given problem instance.
Computer Usage:
For the Implementation Projects (IPs), students will primarily use and extend the Java-based Kepler workflow system, which is available under Linux, Windows and MacOS. Computer usage is not required for homework.
Goals:
The course introduces data modeling, data integration, knowledge representation, and scientific workflow challenges and techniques with a focus on scientific applications. Advanced topics include: ontologies as formal metadata, reasoning with ontologies in description logics, semantic query rewriting/optimization using ontologies, models of computation and provenance for scientific workflows.
Instructor: B. Ludaescher, M. Gertz
Prepared by: B. Ludaescher (September 2007)
Overlap Statement:
There is no significant overlap with any other course.