期刊文献+

基于N-gram和向量空间模型的语句相似度研究 被引量:14

A measure of sentence similarity based on N-grams and Vector Space Model
下载PDF
导出
摘要 语句相似度研究广泛应用于信息检索、语言测试自动评分和机器翻译评价等领域。以往的研究有的注重语言形式,有的偏重语言意义,把形式和意义结合起来对语句相似度进行综合考察的研究则比较少见。本文运用自然语言处理中的N-gram方法,结合向量空间模型,从语言形式和语言意义两个方面出发对语句相似度进行了深入研究。研究结果表明,该算法计算出的语句相似度与中外评分员评判的相似度之间具有较高的一致性,总体相关系数分别达到了.928和.925,显示本研究所提出的相似度算法效果显著。 Measures of sentence similarity have wide applications in Information Retrieval, language assessment and machine translation evaluation. In approaching sentence similarity, most previous studies have concentrated either on form or on meaning, and studies investigating both aspects are rarely found in the literature. This study adopts the N-gram method and the Vector Space Model to measure the semantic and formal similarities between sentences. Results of the study show that the algorithm employed in this research achieves measures which correlate highly with human judgment of semantic and formal similarities. The overall correlation coefficients with human raters reach .928 and .925 respectively, indicating that the algorithm provides a reliable measure of sentence similarity.
出处 《现代外语》 CSSCI 北大核心 2007年第4期405-413,共9页 Modern Foreign Languages
基金 国家社科基金项目"基于大型双语对应语料库的翻译研究与翻译教学平台"(项目编号05BYY013)的部分成果 北京外国语大学中国外语教育中心"中国外语教育基金"课题资助。
  • 相关文献

参考文献13

  • 1Akiba Y., K. Imamura & E. Sumita. 2001. Using multiple edit distances to automatically rank machine translation output[P]. In Proceedings of MT Summit Ⅷ: 15-20.
  • 2Dumais S. T, T. K. Landauer & M. L. Littman. 1996. Automatic cross-linguistic information retrieval using latent semantic indexing [P]. SIGIR96 Workshop on Cross-Linguistic Information Retrieval.
  • 3Laham, D. 1997. Latent semantic analysis approaches to categorization. [A] In M. G. Shafto & P. Langley (eds.). Proceedings of the 19^th Annual Conference of the Cognitive Science Scociety [C]. Hillsdale, NJ. Lawrence Erlbaum Associates, Inc, 979.
  • 4Landauer, T. K. , P. W. Foltz & D. Laham. 1998. Introduction to latent semantic analysis [J]. Discourse Processes 25: 259-84.
  • 5Landauer T. K., D. Laham & P. W. Foltz. 2003.Automated essay scoring and annotation of essays with the Intelligent Essay Assessor[A]. In M. Shermis, D. Burstein & C. Jill (eds.). Automated Essay Scoring: A Cross Disciplinary Perspective [C]. Mahwah, NJ: Lawrence Erlbaum Associates.
  • 6Leusch, G., N. Ueffing & H. Ney. 2003. A novel string-to-string distance measure with applications ,to machine translation evaluation [P].Proceedings of MT Summit Ⅸ, New Orleans, U.S.A.
  • 7MANNING C D,SCHOTZE H.统计自然语言处理基础[M].苑春法,等译.北京:电子工业出版社,2005.
  • 8Papineni, K. & S. Roukos. 2002. Bleu: A method for automatic evaluation of machine translation [P]. Proceedings of the 40^th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, 311-18.
  • 9Rosario, B. 2000. Latent semantic indexing : An overview [A]. INFOSYS 240 Spring 2000 [C].
  • 10Turney, P. D. 2005. Measuring semantic similarity by latent relational analysis[P]. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), Edinburgh, Scotland, 1136-41.

二级参考文献32

共引文献134

同被引文献272

引证文献14

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部