uc davis computer science data institute nina amenta

Professor Nina Amenta, a co-PI on the new UC Davis TETRAPODS Institute of Data Science. Photo: Kevin Tong/UC Davis.

October 7, 2019

By Becky Oskin, originally posted by the College of Letters and Science.

A new institute at UC Davis will advance the fundamentals of data science and prepare students to solve data analysis and machine learning problems in diverse fields.

Focusing on research and education, the UC Davis TETRAPODS Institute of Data Science (UCD4IDS) will serve as a hub for faculty, scholars, and students with interests and expertise in data science. The institute will promote interdisciplinary collaborations among 35 faculty members from four departments: computer science, electrical and computer engineering, mathematics, and statistics.

“At UC Davis, we already have a lot of collaborations in data science. We are putting all these activities together to make a cohesive program,” said Naoki Saito, professor in the Department of Mathematics, who will direct the data science institute.

Researchers with UCD4IDS will also closely collaborate with the recently announced IMPACT Data Science Center — led by Professor of Mathematics Thomas Strohmer — which will engage with business, industry, and government agencies.

The institute is funded by a $1.5 million award from the National Science Foundation’s “Harnessing the Data Revolution: Transdisciplinary Research In Principles Of Data Science (TRIPODS)” program. As a TRIPODS Phase I institute, UCD4IDS will run for three years, with potential for extension and expansion in the future.

In addition to Saito, other principal investigators on the NSF grant include Nina Amenta, professor of computer science; Chen-Nee Chuah, professor of electrical and computer engineering; and Thomas Lee, professor of statistics.

Research at UCD4IDS will focus on three broad themes:

  1. Fundamentals of machine learning directed toward biological and medical applications.
  2. Optimization theory and algorithms for machine learning, including numerical solvers for large-scale, nontrivial learning problems.
  3. High-dimensional data analysis on graphs and networks.

Training the next generation of data scientists

By facilitating collaboration across disciplines, UCD4IDS will enhance the research, development, and application of cutting-edge data science techniques and play a key role in rethinking data science education at UC Davis, Saito said.

“There is a serious shortage of properly trained data scientists who really understand the theories and algorithms being used to analyze and interpret large and complex data,” Saito said. “We do not want data scientists to simply use the latest data science theory and tools as black boxes without deeply understanding the phenomena underlying the data they deal with.”

A “black box” typically refers to a system for which only the inputs and outputs can be seen, not the internal workings. For example, if an algorithm predicts a cell is malignant by analyzing an image, we need to understand why the algorithm tells us that, Saito said.