期刊文献+

基于词语信息度的翻译对应句检索

Translation sentence-pair retrieval based on word information
原文传递
导出
摘要 本文提出了一种基于词汇检索翻译对应句的方法。原文句子与译文句子并不在词汇级存在一一对应的关系,判断是否构成翻译关系也不需要认定所有的词都构成翻译对。本文提出了词语信息度(WI)的概念来反映词在句子中的重要性。词语信息度由词频、词在文档中的分布、词性、词的长度构成。判断是否构成翻译关系时,只关注信息度高的词汇是否构成翻译对。基于高信息度词汇翻译对构建了翻译对应句检索系统。实验表明,系统性能优于简单的基于所有词汇的翻译对应句检索方法,在噪声实验中,与相关研究对比表现了更好的强健性。 This paper proposes a method by which translation sentence pairs can be retrieved based on word-level information.Since there is no one-to-one mapping relationship between words in the original sentences and those in the translated sentences,it is not necessary to assume that all words should be matched when identifying translation sentence pairs.We propose that the concept of Word Information(WI) be adopted to measure the importance of words in a sentence.WI consists of word frequency,word distribution,POS and word length.Only words with a high WI values are considered when identifying translation sentence pairs.We build a translation sentence pairs retrieval system based on word pairs with a high WI value.Experiments show the retrieval system outperforms those based on all words.Even better result is achieved in noisy experiments,which shows this method has better robustness.
作者 秦颖 李颖超
出处 《外语教学与研究》 CSSCI 北大核心 2012年第2期270-278,321,共9页 Foreign Language Teaching and Research
基金 国家社科基金项目"中英文跨语言剽窃文本自动识别技术研究"(10CYY024) 国家社科基金重大项目"大规模英汉平行语料库的建立与加工"(10&ZD127)资助
  • 相关文献

参考文献2

二级参考文献13

  • 1Philip Resnik.Parallel Strands:A Preliminary Investigation into Mining the Web for Bilingual Text[A].In:Third Conference of the Association for Machine Translation in the Americas (AMTA-98)[C],Langhorne,PA,Lecture Notes in Artificial Intelligence 1529,Springer,October,1998.
  • 2Philip Resnik.Mining the Web for Bilingual Text[A].In:37th Annual Meeting of the Association for Computational Linguistics (ACL'99)[C].College Park,Maryland,June 1999.
  • 3Wessel Kraaij Jian-Yun Nie.Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval[J].Computational Linguistics 29(3):381-419 (2003).
  • 4Noah A.Smith.Detection of Translational Equivalence.Bachelor Thesis(2001)[D],University of Maryland.
  • 5Noah A.Smith.From Words to Corpora:Recognizing Translation[A].In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)[C],Philadelphia,Pennsylvania.
  • 6Ralf Steinberger,Bruno Pouliquen,Johan Hagman.Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC[A].In:CICLing 2002[C]:415-424.
  • 7Md.Maruf Hasan and Yuji Matsumoto.Multilingual Document Alignment-A Study with Chinese and Japanese[A].In:Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001)[C],Tokyo,November 2001,617-623.
  • 8Md.Maruf Hasan.Cross-language Information Retrieval,Document Alignment and Visualization -A Study with Japanese and Chinese[D].PHD thesis(2001),Nara Institute of Science and Technology.
  • 9Huaping Zhang,Qun Liu,Hao Zhang,Xueqi Cheng,Automatic Recognition of Chinese Unknown Words Based on Role[A],Tagging 19th International Conference on Computational Linguistics[C],SigHan Workshop,2002.8.
  • 10Pascale Fung and Percy Cheung,Multi-level Bootstrapping for Extracting Parallel Sentences from a Quasi-Comparable Corpus.In:Proceedings of COLING 2004[C],Geneva,Switzerland:August 2004.

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部