摘要
Web日志挖掘是Web挖掘的一种,介绍了Web日志挖掘的一般过程,研究了k-means聚类算法,并分析了k-means聚类算法的不足。k-means聚类算法迭代过程中每次都需要计算每个数据对象到簇质心的距离,使得聚类效率不高,针对这个问题,提出了k-means聚类算法的改进算法,该算法避免了重复计算数据对象到簇质心的距离,并用这两种算法实现了Web文档的聚类。试验结果表明,该改进算法提高了聚类效率。
Web log mining is one of the web mining. The process of the web log mining and the k-means algorithms are introduced. And the shortage of the k-means algorithm is analyzed. The k-means algorithm needs to compute the distance between every data object and the center of the clusters, which lowers the efficiency. To this problem, an enhanced algorithm of the k-means is put forward, which avoids computing the distance between every data object and the center of the clusters. Web document clustering is implemented with two algorithms and it is shown that the enhanced algorithm improves the clustering efficiency.
出处
《计算机工程与设计》
CSCD
北大核心
2008年第18期4708-4710,共3页
Computer Engineering and Design