期刊文献+

一种新的Web中文文本聚类方法研究 被引量:3

RESEARCH ON A NOVEL WEB CHINESE TEXT CLUSTERING METHOD
下载PDF
导出
摘要 传统的文本聚类缺少语义信息,文本的特征向量高维稀疏,忽略了Web文本的特殊性。为了解决这些问题,提出一种Web中文文本聚类方法。在基于知网(HowNet)的概念空间基础上过滤非名词,分析文本中重要词汇的语义,对标签特征集与正文特征集进行特征集聚类,再利用改进的TF-IDF算法选取两个集合中的特征,最终将文本表示为选取的标签特征集与正文特征集的并集,降低了特征的维度,高效地表示了文本。通过实验验证了其有效性。 Traditional text clustering lacks the semantic information, its text eigenvector is high-dimension sparse, and ignores the particularity of the Web text. In order to solve these problems, we propose a Web Chinese text clustering method in this paper. On the basis HowNet-base concept space, the method filters the terms but nouns, analyses the semantics of the important words in the text, and carry out the feature set clustering on label feature set and text feature set. Then it uses the improved TF-IDF algorithm to select features from these two sets, and finally expresses the text as a union of the selected label feature set and text feature set. It reduces the dimensions of features, and expresses the text efficiently. Experimental results demonstrate its effectiveness.
出处 《计算机应用与软件》 CSCD 北大核心 2013年第12期222-225,287,共5页 Computer Applications and Software
关键词 WEB文本聚类 特征降维 知网 文本相似度 Web text clustering Feature dimension reduction HowNet Text simiIarity
  • 相关文献

参考文献9

二级参考文献72

  • 1易高翔,程耕国.Web文本挖掘研究[J].武汉科技大学学报,2005,28(1):72-74. 被引量:5
  • 2赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量:28
  • 3薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报,2005,19(4):59-63. 被引量:63
  • 4Zamir O.A dynamic clustering interface to Web search results[J]. Computer Networks, 1999,31(11/16) : 1361-1374.
  • 5Osinski S.An algorithm for clustering of Web search result[D]. Poland: Poznan University of Technology, 2003.
  • 6Godoy D,Amandi A.Modeling user interests by conceptual clustering[J].Information Systems, 2006,31 : 247-265.
  • 7Hotho A,Staab S,Maedche A.Ontology-based text clustering[J]. Kunstliche Intelligenz, 2002,4: 48-54.
  • 8Flotho A,Staab S,Stumme G.Text clustering based on background knowledge[R].University of Karlsruhe,Institute AIFB,2003.
  • 9Bhogalb J,Macfarlane A.A review of ontology based query expansion[J].Information Processing and Management, 2006,43 : 866-886.
  • 10Wille R.Restructuring lattice theory:an approach based on hierarchies of concepts[M]//Rival I.Ordered Sets.Dordrecht:Reidel,1982: 445-470.

共引文献428

同被引文献25

引证文献3

二级引证文献110

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部