CAREER: Knowledge Enhanced Clustering Using Constraints

Project Description:

This project addresses the development of general principled methods to efficiently include domain knowledge expressed as constraints into clustering algorithms. This not only allows improved clustering quality and algorithm performance but also finding insights that are novel and useful with respect to existing domain expertise.

Award Number:I gratefully acknowledge support of this work from the NSF via award IIS-0801528

Start-End Date: 01/10/07 - 12/31/11

PI: Ian Davidson

Students (Dr. Ke Yin SUNY - Ph.D. Graduated August 2007)

(Mr. Evan Lord SUNY - M.S. Graduated August 2008)

(Ms Zijie Qi DAVIS - Ph.D. Candidate 3rd Year)

Software:

Software to 

a) Generate constraints and 

b) Implement k-means, COP-kmeans (Wagstaff), CVQE (Davidson-Ravi) and LCVQE (Pelleg and Bastra). 

c) KDD 2009, IJCAI 2009 papers on finding alternative clusterings and dimension reduction.

Instructions: Download TAR ball, unzip and type make all.

Please cite Davidson and Ravi, SIAM Data Mining 2005, Davidson and Ravi DMKD 2007, Qi and Davidson KDD 2009, Davidson IJCAI 2009.

TAR-BALL

Books

Constrained Clustering: Advances in Algorithms, Applications and Theory, August 2008 co-edited with Sugato Basu and Kiri Wagstaff. CRC Press. Click here for book details, Buy from Amazon

Journal Publications

Davidson, I, Ravi, SS, "Using Instance-Level Constraints in Agglomerative Hierarchical Clustering: Theoretical and Empirical Results", DATA MINING AND KNOWLEDGE DISCOVERY, vol 18 (2009), pages 28. PDF

Davidson, I, Ravi, SS, "The complexity of non-hierarchical clustering with instance and cluster level constraints", DATA MINING AND KNOWLEDGE DISCOVERY, vol. 14, (2007), pages 37. PDF

Davidson I., Basu S, "Clustering with Constraints: A Survey", Journal Under Revision. TechRep-PDF

Conference Publications

Davidson I., "Knowledge Driven Dimension Reduction", IJCAI 2009 Conference (Proceedings), (2009). PDF

Qi Zijie and Davidson I., "A Principled and Flexible Framework for Finding Alternative Clusterings", ACM KDD Conference (Proceedings), (2009). PDF

Davidson I. and Qi Zijie., "Finding Alternative Clusterings Using Constraints", IEEE International Conference on Data Mining (Proceedings), (2008). PDF

Davidson I. and Ravi, S.S., "Intractability and Clustering with Constraints", International Conference on Machine Learning 2007 (Proceedings), (2007). PDF

Davidson I. Ester M. and Ravi, S.S., "Efficient Incremental Clustering with Constraints," , 13th ACM Knowledge Discovery and Data Mining Conference 2007 (Proceedings), (2007). PDF

Rong Ge, Martin Ester, Wen Jin, Ian Davidson , "Constraint-driven clustering." , 13th ACM Knowledge Discovery and Data Mining Conference (Proceedings), (2007). PDF

Ke Yin, "Informed Clustering Using the Minimum Description Length Principle" , Computer Science Ph.D. Dissertation - State University of New York - Albany, (2007). Thesis Published PDF