Department » Colloquia » Abstracts

"Expressive Speech-driven Facial Animation"

Yong (Tom) Cao
UCLA

Thursday, April 28
1065 Kemper Hall
3:10 - 4:00 pm

Refreshments/reception to follow in 1131 Kemper


Abstract:

Automatically synthesizing realistic facial animation remains a very challenging problem in Computer Graphics. Although speech-driven facial motion synthesis becomes a well explored research topic, little has been done in the following two issues: real-time lip-syncing and modeling of expressive visual speech. In our work, data-driven approaches are proposed to address these issues.

Most motion graph-based lip-syncing algorithms are notorious for their exponential complexity. We present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.

To model expressive visual behavior during speech, we propose a machine learning approach which is based on Independent Component Analysis. This approach can extract a set of meaningful components from recorded facial motions. The components are classified into two sources: content (speech) and style (emotion). Using emotion components, we derive a generative model of expressive facial motion which can control emotion state of a speech motion. The emotional state of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier