摘要
针对传统Web文本聚类算法没有考虑Web文本主题信息导致对多主题Web文本聚类结果准确率不高的问题,提出基于主题的Web文本聚类方法。该方法通过主题提取、特征抽取、文本聚类三个步骤实现对多主题Web文本的聚类。相对于传统的Web文本聚类算法,所提方法充分考虑了Web文本的主题信息。实验结果表明,对多主题Web文本聚类,所提方法的准确率比基于K-means的文本聚类方法和基于《知网》的文本聚类方法要好。
Concerning that the traditional Web text clustering algorithm without considering the Web text topic information leads to a low accuracy rate of multi-topic Web text clustering, a new algorithm was proposed for Web text clustering based on the topic theme. In the method, multi-topic Web text was clustered by three steps: topic extraction, feature extraction and text clustering. Compared to the traditional Web text clustering algorithm, the proposed method fully considered the Web text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web text clustering is higher than the text clustering method based on K-means or HowNet.
出处
《计算机应用》
CSCD
北大核心
2014年第11期3144-3146,3151,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61272111
61202031
61273216
61202032)
湖北省自然科学基金资助项目(2013CFB002
2013CFA115)
武汉市科技攻关计划项目(201210621214
201210421132)
关键词
多主题
WEB文本
聚类
特征词
准确率
multi-topic
Web text
clustering
characteristic word
accuracy