期刊文献+

基于双语对齐口语语料的翻译词典的自动生成 被引量:2

Automatic Construction of English-Chinese Translation Lexicon from Sentence Aligned Spoken Language Corpus
下载PDF
导出
摘要 提出了一个基于英汉双语口语对齐语料库的翻译词典的自动生成算法 .首先利用释义词典过滤双语文本 ,得到“过滤词典” ,继而通过统计共现概率 ,计算出所有词对的相互关联值 ,并且生成“汉英 (英汉 )相互关联值表” ,对于每个源语词汇选取相互关联值最大的若干项目标语作为候选词对 ,分别赋予信任值 1,然后统计每个候选词对的信任值作为翻译词典的分级标准 ,得到 4个不同级别的词典 ,其中“过滤词典 +4级词典”在召回率为 93 5 %的情况下 ,正确率达到 93 389% . This paper described an algorithm for automatic construction of English-Chinese translation lexicon from sentence aligned parallel spoken language corpus. The first part of the translation lexicon is get by using the electronic dictionary to filter the corpus. Secondly, authors count the co-occurrence probability and calculate the association score of the word pairs to produce The Table of Chinese-English (English-Chinese) Words Co-occurrence Association Score. Then, for each word pairs in the four tables, give 1 as the confidence score if the word pair's co-occurrence association score is the top five for each source word. Then, use the confidence score as the criterion for constructing 4 levels translation lexicons. The filtered lexicon and the 4th level lexicon get the precision of 93.389% and the recall of 93.5%. This is an inspiring result, because it is based on the Indo-European and the non-Indo-European spoken language corpus. In this algorithm, the grading of the lexicon can deduce effectively the number of the incorrect entries in the high level lexicon, which makes the translation lexicon more practicable, and solves the problem of the balance of the precision and recall.
出处 《计算机学报》 EI CSCD 北大核心 2003年第3期275-280,共6页 Chinese Journal of Computers
  • 相关文献

参考文献16

  • 1Gale W, Church K. A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991. 177~184
  • 2Brown P, Lai J,Mercer R. Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991. 169~176
  • 3Simard M, Foster G, Isabelle P. Using cognates to align sentences in parallel corpora. In: Proceedings of the 4th Conference on Theoretical and Methodological Issues in machine Translation (TMI-92), Montreal, Canada, 1992. 67~81
  • 4Church K. Char-align: A program for aligning parallel texts at the character level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993. 1~8
  • 5Wu D. Aligning a parallel english-chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994. 80~87
  • 6Fung P. Pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33th Annual Meeting of tha Association for Computational Linguistics, Boston, USA. 1995. 226~233
  • 7Kumano A, Hirakawa H. Building an MT dictionary from parallel texts based on linguistic and statistical information. In: Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994. 76~81
  • 8Wu D, Xia X. Large-scale automatic extraction of an English-Chinese translation lexicon. Machine Translation,1995, 9(4):285~313
  • 9Hiemstra D. Using statistical methods to create a bilingual dictionary[M S dissertation].University of Twente,Netherlands, 1996
  • 10Smadja F, Mckeown K, Hatzivassiloglou V. Translating collocations for bilingual lexicon: A statistical approach. Computational Linguistics, 1996, 22(1):3~38

共引文献2

同被引文献13

引证文献2

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部