期刊文献+

文本聚类的重构策略研究 被引量:5

Research on Reorganization of Text Clustering Results
下载PDF
导出
摘要 该文提出面向文本距离并独立于聚类过程的聚类重构策略。提出邻近域的概念并阐述了邻近域规则,设计了高斯加权邻近域算法。利用高斯函数根据样本与聚簇中心的距离为样本赋权,计算聚簇间距。基于邻近域权重对文本聚类的结果实施重构。使用拆分算子拆分稀疏聚簇并调整异常样本;使用合并算子合并相似聚簇。实验显示聚簇重构机制能够有效地提高聚类的准确率及召回率,增加聚簇密度,使得形成的聚类结果更加合理。 This paper illustrates a distance oriented reorganization strategy in which clusters could be reorganized in independence from clustering process.The concept of Nearest Domain is proposed and Nearest Domain rules are elaborated.Then Gauss Weighing Algorithm is designed to re-wieght a text by the distance from cluster kernel.At last,Nearest Domain Weights will separates sparse clusters and adjusts abnormal texts while combines similar ones.Clustering experiment shows that reorganization process effectively improves the accuracy and recall rate and makes result more reasonable by increasing the inner density of clusters.
出处 《中文信息学报》 CSCD 北大核心 2016年第2期189-195,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61362028)
关键词 文本聚类 聚簇重构 邻近域规则 高斯加权 text clustering cluster reorganization nearest domain rule Gauss weighing
  • 相关文献

参考文献10

二级参考文献71

  • 1刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 2任江涛,孙婧昊,施潇潇,黄焕宇,印鉴.一种用于文本聚类的改进的K均值算法[J].计算机应用,2006,26(B06):73-75. 被引量:24
  • 3彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 4Dumais S.T.LSI Meets TREC:A Status Report[C]// D.Harman (Ed.) Prof,of The First Text RE-trieval Conference (TREC1),National Institute of Standards and Technology Special Publication 500-207,1993:137-152.
  • 5Liu X.,Croft W.R Cluster-Based Retrieval Using Language Models[C]// Proc.of SIGIR,2004:186-193.
  • 6Zamir O.,Etzioni O.,Madani O.,et al.Fast and Intuitive Clustering of Web Documents[C]// Proc.of KDD,1997:287-290.
  • 7Han J.and Kamber M.Data Mining:Concepts and Techniques,Second Edition[M].Morgan Kaufmann Publishes,2006.
  • 8Wu H.,Phang T.H.,Liu B.,et al.A Refinement Approach to Handling Model Misfit in Text Categorization[C]// SIGKDD,2002:207-216.
  • 9Tan S.,Cheng X.,Ghanem MM,et al.A Novel Refinement Approach for Text Categorization[C]//Proc.of the 14th ACM CIKM,2005:469-476.
  • 10Shawe-Taylor J.,Cristianini N.Kernel Methods for Pattern Analysis[M].Cambridge University Press,2004.

共引文献19

同被引文献64

引证文献5

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部