摘要
网页分类器设计的核心是对原始分类数据集进行分类规则挖掘,本文提出了一种结合链接结构聚类的混沌粒子群网页分类规则获取算法.算法将聚类和分类结合起来进行分类规则提取:首先用基于K均值的聚类算法对一部分有代表性的链接结构数据聚类,进行类别自动标注,形成训练集;再用混沌粒子群算法对已标注类别的数据提取分类规则.实验结果表明,这种模式充分发挥了基于链接的分类方法受人为因素干扰最小的优点,减少了人工标注类别的工作量,同时提高分类的准确率和效率.
The core of classifier is extracting web document categorization rule. An algorithm of web document categorization rule extraction based on chaos particle swarm optimization combining linkage clustering is proposed in this paper. Aiming at advantages of clustering and classifying, the algorithm gains categorization rule by combine them: firstly cluster one part of representative unlabeled linkage data to label category automatically. Then categorization rule is gained by using chaos particle swarm algorithm. The experiment results show this model not only can develop thoroughly the merit of linkage clustering least disturbance from human factor but also can avoid the fault of original data set and alleviate works of classifying by specialist as well as ratio of precision and recall have improved a lot.
出处
《华中师范大学学报(自然科学版)》
CAS
CSCD
2008年第4期535-538,共4页
Journal of Central China Normal University:Natural Sciences
基金
国家自然科学基金资助项目(60773009)
国家重点基础研究发展规则"973"基金资助项目(2007AA012290).
关键词
网页分类
规则抽取
混沌粒子群
链接结构聚类
web document categorization
rule extraction
chaos particle swarm optimization
linkage clustering