Home » Courses » Course Descriptions

ECS 189F Scientific Data Management (4) I

Lecture: 3 hours

Laboratory: 1 hour

Prerequisite: Programming skills at the level of course 40; MAT 21C

Grading: Letter; projects (60%), midterm/final (40%)

Catalog Description:
Relational databases, SQL, non-standard databases, XML, scientific workflows, interoperability, data analysis tools, metadata.

Goals:
This is an interdisciplinary course in data management for the purpose of facilitating research and application development using open source DBMS packages and large-scale scientific data sets..

Expanded Course Description:
Topics to be covered and approximate time spent on each (sequence may vary):

  1. Introduction
    1. Requirements and properties of scientific databases
    2. Issues related to database support for scientific data management
    3. Types of scientific data: structured, unstructured, temporal, spatial, image, text
  2. Introduction to Relational Databases
    1. The relational data model
    2. Structured Query Language (SQL)
    3. Open source and commercial DBMS packages (Postgres and MySql)
  3. Extensible Markup Language (XML)
    1. Role of XML in Scientific Data Management
    2. XML data model and query languages (XPath, XSLT)
    3. Standards, tools, and systems
  4. Ontologies and Metadata
    1. The role of ontologies and metadata
    2. Metadata standards (e.g., RDF)
    3. Standards, tools, and systems
  5. Scientific Workflows
    1. Principles of scientific Workflows
    2. From data preprocessing to data integration to data analysis
    3. The Kepler Scientific Workflow System
    4. Web Services
    5. Examples of scientific workflows

Textbook:
Several papers and tutorials will be made available

Computer Usage:
Students work on projects in a Linux environment, using standard Linux/UNIX tools as well as major database software packages and associated development tools.

Programming Projects:
There will be several individual and group projects. In individual projects, students have to use an existing scientific database (such as a Protein DB, Image DB, spatial DB (satellite data)), query the database and build simple tools on top of the database. In group projects, students have to install a DBMS package, populate the database with scientific data, and design and implement a complete scientific workflow on top of that database.

Instructors: B. Ludaescher, M. Gertz

Prepared By: B. Ludaescher, M. Gertz (January 2005)

Overlap Statement:
This course offers only a very basic introduction to relational databases, a topic that is covered in detail in ECS 165A. A much shorter introduction to XML is taught in ECS 165B.

 

2/05