摘要
为了定量分析聚类算法的聚类结果,提出了基于引力概念的聚类质量评估算法.该算法将数据空间中的数据点视为带有单位质量的质点,通过分析聚类结果中数据点之间的引力关系来评估聚类结果的质量.在一个聚类结果中,各类中的数据点之间引力大并且噪音数据受到的引力小,这样的聚类结果视为质量较高的聚类结果.相反,如果类中数据间的引力较小而噪音数据所受到的引力较大,这样的聚类结果就是一个质量不高的聚类结果.在几个不同的数据集上,对算法的有效性和高效性进行了测试.实验结果表明,该算法能在极短的响应时间内得到聚类结果评估值,正确地反映聚类结果的优劣.提出的算法可以引导聚类方法自动发现最佳聚类结果而无需人工干预.
A clustering result evaluating algorithm is presented in a gravitational way, where all the data points in the data space are regarded as the particles assigned with unit mass. The quality of such a clustering result is evaluated through analyzing the gravitational relation between different data points in the clustering result in which the greater the gravitation between data points, the smaller the gravitation acted on noise data points--this is regarded as a quality result and vice versa. Experiments conducted on several datasets verify the validity and high efficiency of the proposed algorithm which can get an evaluation value to reflect whether the clustering result is of good or poor quality. Furthermore, the proposed algorithm can lead the clustering algorithm to find the best result automatically without any manual interference.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2007年第8期1109-1112,共4页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(6027307960473074)
关键词
聚类
聚类质量评估
引力
聚类算法
数据挖掘
clustering
clustering result evaluation
gravitation
clustering algorithm
data mining