摘要
数据挖掘中的聚类分析对发现数据中隐含的类别和分布有着重要的应用。传统的K-Means聚类算法在给出簇数目的条件下能够对数据进行较好的聚类,算法采用批量模式进行学习,在每一趟数据扫描结束后更新簇中心。序列模式是另外一种学习方式,它每扫描一条记录就更新簇中心。本文提出并实现了基于序列模式的K-Means算法,并与采用批量模式的K-Means算法进行了比较。
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions underlying in the data. Classical K-Means algorithm can give a good result when given the cluster number. It uses batch mode to adjust the centers of clusters at the end of each epoch. Sequential mode is another method which updates the centers when each record is scanned. In this paper a K-Means algorithm employing sequential mode is proposed, implemented and compared with algorithm employing batch mode.
出处
《计算机科学》
CSCD
北大核心
2004年第6期156-158,193,共4页
Computer Science
基金
教育部重点科学技术研究项目(02038)
天津自然科学基金项目(023600611)资助