期刊文献+

基于语义相关度的中文文本聚类方法研究 被引量:9

Research on Chinese Text Clustering Method Based on Semantic Relevancy
下载PDF
导出
摘要 [目的/意义]在基于向量空间模型的文本聚类中,文本相似度计算忽略特征项间语义关联,针对此问题,提出一种改进的语义文本相似度计算方法。[方法/过程]新方法利用维基百科知识库计算语义相关度,结合特征项在文本中的表示权重,构造文本相似度语义加权因子,并进行K-means文本聚类实验。[结果/结论]与传统的余弦相似度相比,改进后的语义文本相似度应用在文本聚类上,能有效提高聚类的准确度。[局限]语义相关度的计算没有对词语进行消歧处理。 [Objective / significance] This paper proposes an improved semantic text similarity computation method to solve the problem of feature terms semantic association deficiency in text similarity computation for text clustering based on Vector Space Model.[Methods / process] Firstly,the new method uses Wikipedia to compute the semantic relevance.Secondly,the paper combines the weight of feature item in the text to construct semantic weighting factor of text similarity,and carry on the experiment of Kmeans text clustering as well.[Results / conclusion] By comparing with the traditional cosine similarity,experimental results show that the improved semantic text similarity used in the text clustering can effectively improve the accuracy of clustering.[Limitations] Word sense disambiguation is ignored in the process of the feature terms semantic relevancy computation.
出处 《情报理论与实践》 CSSCI 北大核心 2016年第2期129-133,共5页 Information Studies:Theory & Application
基金 国家自然科学基金项目"基于复杂网络的中文文本语义相似度研究"的成果 项目编号:71373200
关键词 维基百科 语义相关度 文本相似度 文本聚类 Wikipedia semantic relevancy text similarity text clustering
  • 相关文献

参考文献12

二级参考文献42

  • 1陆汝钤.知识科学与计算科学[M].北京:清华大学出版社,2002..
  • 2董振东 董强.知网简介[M].1999[EB/OL].http://www.keenage.com.,.
  • 3龚劬.图论与网络最优化算法[M].重庆:重庆大学出版社,2000.87-96.
  • 4Philip Resnik. Using information content to evaluate semantic simi- larity in a taxonomy [A]. In: C. Raymond Perrault, Chris S. Mellish, Renato deMori eds. Proceedings of the 14th International Joint Conference on Artificial InteUigence [ C]. Montreal: AAAI Press, 1995:448-453.
  • 5George A Miller. WordNet: a lexical database for english [ C].Communications of the ACM, 1995:38( 11 ) :39-41.
  • 6Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi. WordNet: similarity: measuring the relatedness of concepts [ C ]. In: David Palmer, Joseph Polifroni, Deb Roy, eds. Proc. of Human Lan- guage Tectmology conference. Montteal: Association for Computa- tional Linguistics, 2004:38-41.
  • 7Li Yun. Mining semantic knowledge from chinese Wikipedia [D]. Beijing University of Posts and Telecommunications,2009.
  • 8Evgeniy Gabrilovich, Shaul Markovitch. Computing semantic relat edness using Wikipedia-based explicit semantic analysis [ A]. InI Manuela Veloso. Proceedings of the 20th International Joint Confe1 ence on Artificial Intelligence [ C ]. Hyderabad: AAAI Press 2007 : 1606-1611.
  • 9David Milne, Ian H Witten. An effective, low-cost measure of se- mantic relatedness obtained from Wikipedia links [ A]. In: Taylor Matthew, Dfiessens Kurt, Fern Alan eds. Proc. of the 23th Associ- ation for the Advancement of Artificial Intelligence [ C ]. Chicago: AAAI Press,2008:25-30.
  • 10Thomas K Landauer, Peter W Foltz, Darrell Laham. An introduc- tion to latent semantic analysis [ J]. Discourse Processes, 1998,25 (2-3) :259-284.

共引文献118

同被引文献92

引证文献9

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部