摘要
聚类广泛应用于统计、机器学习、模式识别、数据分析等领域并越来越受重视。本文研究了各种聚类算法共同面临的五个问题:聚类效果评估、类数目估计、数据预处理、样本间相似性测量、抗干扰性能,分析了对这些问题的有代表性的解决方法,总结并预测了未来聚类算法在这五个方面的研究方向。
Clustering is widely used in several fields such as statistics, machine learning, pattern recognition and numerical analysis. Recently, more and more attention has been paid to it. In this paper, five issues commonly concerned are discussed, they are: assessment of clustering results, estimation of total number of clusters, data preparation, measures of data proximity and outlier handling. Representative solutions to these issues are surveyed, conclusions are summed up, development trend of algorithms to deal with these five issues is forecasted.
出处
《电路与系统学报》
CSCD
2004年第3期92-99,共8页
Journal of Circuits and Systems
基金
国家自然科学基金资助项目(60002003)
关键词
聚类
效果评估
类数目估计
预处理
相似性测量
抗干扰性能
clustering
assessment of results
estimation of total number of clusters
data preparation
proximity measure
outlier handling