Location:
Loews Lake Las Vegas Resort


Scope and Program

Important Dates
Call for Papers

Organization

Organizers
Program Committee

Relevant Links

KDD 2008
KDD Workshops




 

Data Mining with Constraints

1/2 a day Workshop co-located with 

The Workshop Has Been Cancelled


Unfortunately, we have had to cancel the workshop due to some unforseen circumstances.

In its most general formulation, the task of data mining is vastly underspecfied. To make the task more precise, we first have to specify the type of patterns considered such as frequent patterns, a clustering, a predictive model or other regularities in the data. But the discovered patterns may not be novel or actionable in fields where domain expertise already exists or users have strong expectations. We then have to specify what conditions the patterns have to satisfy in order to consider them as solutions to the data mining task at hand. The conditions that a pattern has to satisfy can be elegantly specified as constraints, stated explicitly and under direct control of the user/data miner.

Over the last decade, mining with constraints has emerged as a distinct and important research area in data mining. Constraints play an important role in data mining as the use of constraints enables more efficient data mining and focuses the search for patterns on patterns likely to be of interest to the end user. The ability to express and exploit constraints allows the data miner to inject knowledge into the process of data mining and knowledge discovery.

Several sub-communities have explored the use of constraints in data mining. These include the communities concerned with the topics of clustering with constraints, finding frequent patterns under constraints, and inductive databases/queries. Clustering with constraints typically includes instance-level constraints, specifying which instances should or should not be put within the same cluster. Typical constraints in finding frequent patterns, besides frequency, include closeness or maximality. We can conceive running a mining algorithm with several constraints as running a query on a database that stores patterns (in addition to data). Such a database is called an inductive database, and such queries are called inductive queries. Inductive databases are therefore closely related to constraint-based mining. Constraint-based mining of frequent patterns, predictive models and clusterings has been considered in this research area. Tutorials by each sub-community have been presented at leading data mining conferences (ECML/PKDD 2002, ICDM 2005 and KDD 2006).

A major goal of this workshop is to bring together the researchers from the above research areas, namely clustering with constraints, finding frequent patterns under constraints, and inductive databases/queries. We believe it will be important to profit from each fields expertise to further the aim of practical data mining with constraints.

Why the topic is of interest?

The topics of clustering with constraints, finding frequent patterns under constraints, and inductive databases/queries, are closely related and share a number of common issues. Yet the respective communities have largely existed in parallel without much interaction. Hence, a lot can be gained from the interaction among the topics and communities through the proposed workshop. Relevant recent developments include renewed interest into learning of predictive models, such as equations, under constraints. Further, interest in learning probabilistic models, such as Bayesian networks, is on the rise. There has also been some interest for the use of constraints in learning paradigms that are at the intersection of prediction and clustering, such as predictive clustering (e.g., the use of instance-level constraints for learning predictive clustering trees), semi-supervised learning, and learning mixture models. Recently, the use of complex inductive queries involving both constraint-based mining of frequent patterns patterns and the learning of predictive models has been studied. Applications of this approach in several areas of practical interest have emerged. Predicting gene function and drug design are of particular interest.

We believe a significant body of research is going on related to the above topics which are of central interest to the workshop. A non-exhaustive list of topics of interest is given below:

  • (Types of) Constraints in data mining
    • primitive constraints
    • complex constraints
    • language constraints
    • evaluation constraints

  • (Novel) Algorithms for mining with constraints
    • clusterings
    • frequent patterns
    • predictive models
    • probabilistic models

  • Data mining/inductive query languages
  • Declarative data mining
  • Inductive querying systems
  • Applications of constraint in mining and inductive databases

Contact information of organizers

  • Ian Davidson (davidson@cs.ucdavis.edu)
    University of California - Davis, Department of Computer Science,
    1 Shields Avenue, Davis, CA, 95616, USA.
    Phone: +1 530 601 0385, fax: +1 530 752 4767

  • Saso Dzeroski (Saso.Dzeroski@ijs.si)
    Jozef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia. Phone: +386 1 477 3217, fax: +386 1 425 1083

Program Committee Members (confirmed)

  • Sugato Basu - Google
  • Jean-Francois Boulicaut - INSA Lyon
  • Martin Ester - SFU
  • Bart Goethals - University of Antwerp
  • David Gondek - IBM Watson
  • Rong Ge - SFU
  • Fosca Giannotti - University of Pisa
  • Celine Robardet - INSA Lyon
  • James Bailey - University of Melbourne
  • Sanjay Chawla - University of Sydney
  • Satoshi Oyama - Kyoto University
  • Jan Sturyf - KU Leuven
  • Taneli Mielikainen - Nokia
  • S.S. Ravi - SUNY - Albany
  • Siegfried Nijssen - KU Leuven

Important Dates

  • Submission: May 27, 2008
  • Notification: June 26, 2008
  • Camera ready: July 12 , 2008
  • Workshop day: August 24, 2008




Last modified: May 18th, 2008