期刊文献+

基于词项语义组合的文本相似度计算方法研究 被引量:4

Research on text similarity calculation strategy based on semantic combination of keywords
下载PDF
导出
摘要 文本之间在相似度比较时主要考虑关键词的匹配特性,缺乏对关键词间组合关系的深入分析。针对关键词间组合特性,按序组合的关键词数目越大,对文本之间相似度贡献越大,并提出基于关键词组合数目的非线性语义关联性函数,在LCS基础上提取文本中所有关键词组合块。将这种结合关键词组合关系的相似度比较方法运用于短文本的相似度比较中,数据采用微软语义释义语料库,实验结果表明,短文本相似度计算的准确率和F1值都有了提高,其中F1值的提高较为明显。 Similarity comparison between texts is mainly based on keywords matching, while lacking of analysis of combinationrelationship among keywords deeply. Aiming at the combination of keywords, the larger of the sum of keywordswhich appears orderly, the greater significance for the similarity comparison between texts, a novel non-linear semanticrelevance function is proposed based on the sum of keywords combination cooperatively, under the foundation of LCS theory,it extracts all the combination blocks of keywords. The experimental results on an open benchmark dataset fromMicrosoft Research Paraphrase corpus(MSRP)show that the proposed algorithm acquires a well accuracy and F1 performanceparticularly compared with traditional algorithm under the circumstance of short text similarity comparison.
作者 周丽杰 于伟海 郭成 ZHOU Lijie;YU Weihai;GUO Cheng(Electronic Teaching Center, Yantai Vocational College, Yantai, Shandong 264670, China;Yantai Normal Language Teaching Center, Yantai, Shandong 264670, China;School of Software Technology, Dalian University of Technology, Dalian, Liaoning 116620, China)
出处 《计算机工程与应用》 CSCD 北大核心 2016年第19期90-93,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.61401060 No.61272173) 山东省高等学校科技计划基金(No.J12LN73)
关键词 关键词组合 非线性语义关联 语义关联函数 文本相似度 combination of keywords non-linear semantic relevance semantic relevance function text similarity
  • 相关文献

参考文献13

  • 1Banea C,Hassan S,Mohler M,et al.A superivsed synergisticapproach to semantic text similairity[C].Proceedingsof the 1st Joint Conference on Lexical and ComputationalSemantics,2012:635-642.
  • 2Glinos D.Chunk-based determination of semantic textsimilarity[C].Proceedings of the 1st Joint Conference onLexical and Computational Semantics,2012:547-551.
  • 3Jiang Jungyi,Tsai Shianchi,Lee Shiejue.Multi-label textcategorization based on fuzzy similarity and k nearestneighbors[J].Expert Systems with Applications,2012,39(3):2813-2821.
  • 4Gu Yanhui,Yang Zhenglu,Xu Guandong.Exploration on efficient similar sentences extraction[J].World Wide Web-Internet & Web Information Systems,2014,17(4):595-562.
  • 5Islam A,Inkpen D.Semantic text similarity using corpusbasedword similarity and string similarity[J].ACM Transactionson Knowledge Discovery from Data,2008,2(2):1-25.
  • 6Dong Hongni,Zhao Xiaohui,Wu Jiang,et al.Study onthe calculation of text similarity based on key-sentence[C].Proceedings of the International Conference on E-Businessand E-Government,2010:1952-1955.
  • 7Song Wenhe,Ma Chunxia.The study of thesis replicadetecte methods based on similarity of text[C].Proceedingsof 2010 3rd IEEE International Conference on ComputerScience and Information Technology,2010,3:596-600.
  • 8Tasi C S,Huang Y M,Liu C H,et al.Applying VSMand LCS to develop an integrated text retrieval mechanism[J].Expert Systems with Applications,2012,39(4):3974-3982.
  • 9王开云,孔思淇,付云生,潘泽友,马卫东,赵强.两种基于双向比较的最长公共子串算法[J].计算机研究与发展,2013,50(11):2444-2454. 被引量:9
  • 10王防修,周康.基于最长公共子序列的随机路径选择算法设计[J].计算机工程与设计,2014,35(6):2170-2173. 被引量:2

二级参考文献49

  • 1杨宗长.Windows下健壮的随机数发生器设计[J].工程地质计算机应用,2004(3):14-17. 被引量:1
  • 2Skiena S. The Algorithm Design Manual [M]. 2nd ed. Berlin: Springer, 2008.
  • 3Wang Ke, Cretu G, Salvatore S J. Anomalous payload-based worm detection and signature generation [C] //Proc of Detection of Intrusions and Malware &. Vulnerability Assessment. Berlin: Springer, 2006: 227-246.
  • 4Dan G. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology [M]. New York: Cambridge University Press, 1997.
  • 5Matsubara W, Inenaga S, Ishino A, et al. Computing longest common substring and all palindromes from compressed strings [C] //Proc of SOFSEM2008: Theory and Practice of Computer Science. Berlin: Springer, 2008: 364- 375.
  • 6Manber U, Myers G. Suffix arrays: A new method for online string searches [C] / /Proc of the 1 st Annual ACM-SIAM Symp on Discrete Algorithms. Philadelphia, PA: SIAM, 1990: 319-327.
  • 7Kasai T, Lee G, Arimural H, et al. Linear-time longestcommon-prefix computation in suffix arrays and its applications [C] //Proc of Groupware: Design, Implementation. and Use. Berlin: Springer, 2002: 181-192.
  • 8Babenko M. Starikovskaya T. Computing longest common substrings via suffix arrays [C] //Proc of Computer ScienceTheory and Applications. Berlin: Springer, 2008: 64-75.
  • 9许智磊.后缀数组[EB/OL].(2004-01-01)[2012-04-09]http:/wenku.baidu.com/view/cd7db304e87101f69e31953e.html.
  • 10Michael M. Puglisi S J. Faster lightweight suffix array construction [C] //Proc of the 17th Australasian Workshop on Combinatorial Algorithmst A WOCA). Ballarat: School of Information Technology &. Mathematical Sciences, University of Ballarat. 2006: 16-19.

共引文献14

同被引文献27

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部