期刊文献+

基于《知网》的词语相似度算法研究 被引量:34

Study on HowNet-Based Word Similarity Algorithm
下载PDF
导出
摘要 基于《知网》的词语(句子)相似度计算通常是把义原(词语)之间的最优匹配做为运算的基本单位的,最终的整体相似度数值可由每一部分的相似度值通过适当的加权计算合成而来,这样的做法往往会造成一些匹配对的信息重复和结构不合理。针对这个问题,该文通过统计出两个直接义原集合间的共有信息(共性)和差异信息(个性)来计算集合的相似度,并把此方法引入到词语(句子)的相似度计算中去。最终的实验比对结果表明该文所采用的方法更为稳定和有效。 Word(sentence) similarity computing based on the "HowNet" usually treats the optimal matches between the primitives or words as the basic unit,and the ultimate outcome can be the sum of weighted counts.However,this approach often results in the information duplication and some irrational constructions.To deal with these issues,this paper propose to calculate the similarity of sets by the statistics on common information(commonality) and the different information(differences) between the two sets of direct primitives.Moreover,the paper introduces this measure into the calculation of sentence similarity.The final experimental analysis shows that the proposed method is more stable and effective.
出处 《中文信息学报》 CSCD 北大核心 2010年第6期31-36,共6页 Journal of Chinese Information Processing
基金 国家863计划资助项目(2007AA01Z423) 国家自然科学基金资助项目(60703113) 四川省科技厅资助项目(2008CD00053)
关键词 《知网》 词语相似度 句子相似度 共有信息 差异信息 HowNet word similarity sentence similarity common information different information
  • 相关文献

参考文献8

二级参考文献26

  • 1吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:218
  • 2M. Carl.Recent Research in the Field of Example-Based Machine Translation[A]. CICLing 2001 ,LNCS 2004.
  • 3W. John Hutchins. Machine Translation: a brief history. Concise history of the language sciences: from the Sumerians tothe cognitivists[M]. Oxford:Pergamon Press, 1995.
  • 4Sumita,E.and H.Iida. Experiments and Prospects of.Example-Based Machine Translation[A]. Proceedings of 29th ACL Meeting[C]. Berkeley, 1991,185 - 192.
  • 5K. Chidananda Gowda and E. Diday. Symbolic Clustering Using a New Similarity Measure[J]. IEEE. Transactions on Systems, Man, and Cybernetic, 1992,22(2).
  • 6Federica Mandreoli, Riccardo Martoglia, and Paolo Tiberio. Searching Similar(Sub) Sentences for Example-Based Machine Translation[ A ]. In: Atfi del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati(SEBD 2002 ), Isola d'Elba, Italy, 2002.
  • 7J. Carbonell, J. Goldstein, 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries [ A],In: Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval [C], Melbourne, Australia.
  • 8Lin, Chin-Yew and E. H. Hovy 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics [ A ]. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003) [C],Edmonton,Canada,May 27- June 1,2003.
  • 9Lin, Chin-Yew and E. H. Hovy. 2002. Automated Multi-document Summarization in NeATS [ A ]. In Proceedings of the Human Language Technology Conference (HLT2002) [C] ,San Diego,CA,U.S.A. ,March 23-27,2002.
  • 10Radev,D.R. ,Jing,H. ,and Budzikowska,M.2000. Centroid-based summarization of multiple documents [A] .In ANLP-NAACL workshop on summarization [ C].

共引文献187

同被引文献323

引证文献34

二级引证文献230

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部