CAREER: Knowledge Enhanced Clustering Using Constraints
Project Description:
This project addresses the development of general principled methods to efficiently include domain knowledge expressed as constraints into clustering algorithms. This not only allows improved clustering quality and algorithm performance but also finding insights that are novel and useful with respect to existing domain expertise.
Award Number:I gratefully acknowledge support of this work from the NSF via award IIS-0801528
Start-End Date: 01/10/07 - 12/31/11
PI: Ian Davidson
Students (Dr. Ke Yin SUNY - Ph.D. Graduated August 2007)
(Mr. Evan Lord SUNY - M.S. Graduated August 2008)
(Ms Zijie Qi DAVIS - Ph.D. Candidate 3rd Year)
Software:
Software to
a) Generate constraints and
b) Implement k-means, COP-kmeans (Wagstaff), CVQE (Davidson-Ravi) and LCVQE (Pelleg and Bastra).
c) KDD 2009, IJCAI 2009 papers on finding alternative clusterings and dimension reduction.
Instructions: Download TAR ball, unzip and type make all.
Please cite Davidson and Ravi, SIAM Data Mining 2005, Davidson and Ravi DMKD 2007, Qi and Davidson KDD 2009, Davidson IJCAI 2009.
Books
Constrained
Clustering: Advances in Algorithms, Applications and Theory, August
2008 co-edited with Sugato Basu and Kiri Wagstaff. CRC Press. Click here for book details,
Buy from Amazon
Journal Publications
Davidson, I, Ravi, SS, "Using Instance-Level Constraints in Agglomerative Hierarchical Clustering: Theoretical and Empirical Results", DATA MINING AND KNOWLEDGE DISCOVERY, vol 18 (2009), pages 28. PDF
Davidson, I, Ravi, SS, "The complexity of non-hierarchical clustering with instance and cluster level constraints", DATA MINING AND KNOWLEDGE DISCOVERY, vol. 14, (2007), pages 37. PDF
Davidson I., Basu S, "Clustering with Constraints: A Survey", Journal Under Revision. TechRep-PDF
Conference Publications
Davidson I., "Knowledge Driven Dimension Reduction", IJCAI 2009 Conference (Proceedings), (2009). PDF
Qi Zijie and Davidson I., "A Principled and Flexible Framework for Finding Alternative Clusterings", ACM KDD Conference (Proceedings), (2009). PDF
Davidson
I. and Qi Zijie., "Finding Alternative Clusterings Using Constraints",
IEEE International Conference on Data Mining (Proceedings), (2008). PDF
Davidson I. and Ravi, S.S., "Intractability and Clustering with Constraints", International Conference on Machine Learning 2007 (Proceedings), (2007). PDF
Davidson I. Ester M. and Ravi, S.S., "Efficient Incremental Clustering with Constraints," , 13th ACM Knowledge Discovery and Data Mining Conference 2007 (Proceedings), (2007). PDF
Rong Ge, Martin Ester, Wen Jin, Ian Davidson , "Constraint-driven clustering." , 13th ACM Knowledge Discovery and Data Mining Conference (Proceedings), (2007). PDF
Ke Yin, "Informed Clustering Using the Minimum Description Length Principle" , Computer Science Ph.D. Dissertation - State University of New York - Albany, (2007). Thesis Published PDF