
Friday, December 3, 2004
1065 Kemper Hall
3 :10-4:00 p.m.
The problem of characterizing and detecting recurrent sequence
patterns such as substrings or motifs and related associations or rules
is variously pursued in order to compress data, unveil structure, infer
succinct descriptions, extract and classify features, etc. In Molecular
Biology such regularities have been implicated in various facets of biological
function and structure. The discovery, particularly on a massive scale,
of significant patterns and correlations thereof poses interesting methodological
and algorithmic problems, and often exposes scenarios in which tables
and descriptors grow faster and bigger than the phenomena they are meant
to encapsulate.
This talk reviews some results at the crossroads of statistics, pattern
matching and combinatorics on words that enable us to control such paradoxes,
and presents some related constructions, implementations and empirical
results.