摘要
针对目前网页分类存在的问题,选择利用近义词聚类的方法,将CBC算法运用到网页聚类中,并加入搜索词作为主要的参照数据,通过在聚类算法中加入限定参数的方法对CBC算法进行了改进。用数据集实验对改进后的CBC算法与传统的k-means算法的结果进行了比较,结果表明改进算法在精确度上优于传统的k-means算法,在效率上也有较为明显的优势。
In view of the current webpage classification problems,synonyms clustering method-the clustering by committee (CBC) algorithm was chosen to applied to the webpage clustering,and added the search words as the main reference data.The CBC algorithm was improved by using the weight increase in clustering the search term in the calculation of feature weight.Results of a data set analyzed by the improved CBC algorithm and the k-means algorithm respectively showed that the improved CBC algorithm was better than the traditional algorithm both in accuracy and efficiency.
出处
《北京化工大学学报(自然科学版)》
CAS
CSCD
北大核心
2013年第B12期90-94,共5页
Journal of Beijing University of Chemical Technology(Natural Science Edition)
关键词
聚类算法
网页分类
特征权值
CBC算法
clustering algorithm
webpage classification
feature
clustering by committee (CBC) algorithm