期刊文献+

融合共现距离和区分度的短文本相似度计算方法 被引量:9

Short text similarity measure based on co-occurrence distance and discrimination
下载PDF
导出
摘要 针对短文本内容简短、特征稀疏等特点,提出一种融合共现距离和区分度的短文本相似度计算方法。一方面,该方法在整个短文本语料库中利用两个共现词之间距离计算它们的共现距离相关度。另一方面通过计算共现区分度来提高距离相关度的准确度,然后对每个文本中词项进行相关性加权,最后通过词项的权重和词项之间的共现距离相关度计算两个文本的相似度。实验结果表明,本文提出的方法能够提高短文本相似度计算的准确率。 Aiming at the typical characteristics of severe sparseness and high dimension of short texts,we propose a short text similarity measure method based on co-occurrence distance and discrimination.On the one hand,the method leverages the co-occurrence distance between terms in each document to determine co-occurrence distance correlation.On the other hand,we calculate the co-occurrence discrimination to improve the accuracy of co-occurrence distance correlation,and then the relevance weight of the terms in the text is calculated.The text similarity between two short texts is calculated according to the term weights and the co-occurrence distance between terms.Experimental results show that the proposed method outperforms the baseline algorithm in term of performance and efficiency in similarity calculation.
作者 刘文 马慧芳 脱婷 陈海波 LIU Wen;MA Hui-fang;TUO Ting;CHEN Hai-bo(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)
出处 《计算机工程与科学》 CSCD 北大核心 2018年第7期1281-1286,共6页 Computer Engineering & Science
基金 国家自然科学基金(61762078 61363058) 广西可信软件重点实验室研究课题(KX201705) 西北师范大学学生创新能力计划(CX2018Y054)
关键词 短文本 共现距离相关度 共现区分度 词项加权 相似度计算 short text co occurrence distance correlation co occurrence discrimination term weighting similarity calculation
  • 相关文献

参考文献3

二级参考文献38

  • 1索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30. 被引量:88
  • 2赵鹏,蔡庆生.一种基于《知网》的中文文本聚类算法的研究[J].计算机工程与应用,2007,43(12):162-163. 被引量:7
  • 3Fung B C M,Wang K,Ester M.Hierarchical document clustering//Wang John ed.The Encyclopedia of Data Warehousing and Mining,idea Group.2005:970-975.
  • 4Salton G.The SMART Retrieval System-Experiments in Automatic Document Processing.Englewood Cliffs,New Jersey:Prentice Hall Inc,1971.
  • 5Wang Y,Julia H.Document clustering with semantic analysis//Proceedings of the 39th Hawaii International Conferences on System Sciences.Hawaii,US,2006:54-63.
  • 6Hotho A,Staab S,Stumme G.Wordnet improves text document clustering//Proceedings of the Semantic Web Workshop at SIGIR-2003,26th Annual International ACM SIGIR Conference.Toronto,Canada,2003:541-550.
  • 7Hall P,Dowling G.Approximate string matching.Computing Survey,1980,12(4):381-402.
  • 8Coelho T,Calado P,Souza L,Ribeiro-Neto B,Muntz R.Image retrieval using multiple evidence ranking.IEEETransactions on Knowledge and Data Engineering,2004,16(4):408-417.
  • 9Ko Y,Park J,Seo J.Improving text categorization using the importance of sentences.lnformation Processing and Management,2004,40(1):65-79.
  • 10Erkan G,Radev D.Lexrank:Graph-based lexical centrality as salience in text summarization.Journal of Artificial Intelligence Research,2004,22(7):457-479.

共引文献224

同被引文献69

引证文献9

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部