Computer Science

ECS 116 Databases for Non-Majors

ECS 116 Databases for Non-Majors (4 units)

Format
Lecture: 3 hours
Discussion: 1 hour

Catalog Description:
Data modeling, ontologies, relational databases, SQL, querying and transforming XML, scripting and data analysis with Python, workflows, data provenance, data curation. Course not intended for CS or CSE majors.

Prerequisite: Programming skills at least at the level of course 10

Grading: Letter; projects (60%), midterm (17%), final (23%)

Credit restrictions, cross listings: Not open for credit for students who have completed courses 165A, 165B, or 166

Summary of course contents

  1. Data Modeling
    1. conceptual modeling (ER, UML)
    2. relational modeling
    3. metadata, ontologies
  2. Structured Query Language (SQL)
    1. Simple queries
    2. Nested queries, aggregation
    3. Database programming
  3. Semistructured and Graph Data
    1. Semistructured data: DTDs, XML-Schema
    2. Introduction to XPath, XQuery, XSLT
    3. Graph databases and network data
  4. Scientific Workflows and Scripting
    1. Principles of scientific workflows and systems
    2. Workflow scripting in Python
    3. IPython notebook
  5. Data Provenance and Curation
    1. Provenance Models: OPM, W3C PROV
    2. Data curation

This course offers a hands-on introduction to databases and scientific data management for non-CS majors. Students should have prior programming experience, at least ECS 10, preferably ECS 30 or equivalent.

There will be several individual and group projects, all with a hands-on, programming focus. In individual projects, students will learn how to query relational and XML databases, and write simple data processing and analysis scripts in Python. In group projects, students have to install a DBMS package, populate the database with scientific data, and design and implement a complete scientific workflow on top of that database. The systems and tools used for these projects resemble those that would be found in industry to the extent possible, including the standard database query languages SQL and technologies such as XML, RDF and ontology description languages. Projects are graded based on the design, performance, and correctness, including documentation. Examination questions are based on the foundational material discussed in the lecture and from the projects.

Goals: Students will learn data management for the purpose of facilitating research and application development using open source DBMS packages and scientific datasets.

Illustrative reading
Papers and tutorials selected by the instructor.

Computer Usage:
Students work on projects in a Linux environment, using standard Linux/UNIX tools as well as major database software packages and associated development tools.

Programming Projects:
There will be several individual and group projects. In individual projects, students have to use an existing scientific database (such as a Protein DB, Image DB, spatial DB (satellite data), query the database and build simple tools on top of the database. In group projects, students have to install a DBMS package, populate the database with scientific data, and design and implement a complete scientific workflow on top of that database.

Engineering Design Statement:
The projects involve design, implementation and verification of scientific database applications using a variety of public domain and commercial database systems, including Postgres, Oracle, GRASS, and Kepler. The systems and tools used for these projects resemble those that would be found in industry to the extent possible, including the standard database query languages SQL and technologies such as XML, RDF and Ontology description languages. Projects are graded based on the design, performance, and correctness, including documentation. Examination questions are based on scientific (meta) data models and database design techniques discussed in the lecture and from the projects.

ABET Category Content:
Engineering Science: 2 units
Engineering Design: 2 units

Goals:
Students will:

  • learn data management for the purpose of facilitating research and application development using open source DBMS packages and large-scale scientific data sets

Student Outcomes:

  • An ability to apply knowledge of mathematics, science, and engineering
  • An ability to design and conduct experiments, as well as to analyze and interpret data
  • An ability to design a system, component, or process to meet desired needs within realistic constraints such as economic, environmental, social, political, ethical, health and safety, manufacturability, and sustainability
  • An ability to identify, formulate, and solve engineering problems
  • An ability to communicate effectively
  • A recognition of the need for, and an ability to engage in life-long learning
  • A knowledge of contemporary issues
  • An ability to use the techniques, skills, and modern engineering tools necessary for engineering practice

GE3
Science & Engineering
Scientific Literacy

Overlap: This course overlaps with ECS 165A (data modeling, SQL) and ECS 165B (XML).

Instructors: B. Ludäscher

History: 2012.10.20 (B. Ludäscher): Renumbered course (formerly ECS 166). Revised the title, short form of title, catalog description, course contents, and overlap statement. Prior version by M. Gertz (April 2005).

border