基于双语对齐口语语料的翻译词典的自动生成被引量：2

Automatic Construction of English-Chinese Translation Lexicon from Sentence Aligned Spoken Language Corpus

下载PDF

导出

摘要提出了一个基于英汉双语口语对齐语料库的翻译词典的自动生成算法 .首先利用释义词典过滤双语文本 ,得到“过滤词典” ,继而通过统计共现概率 ,计算出所有词对的相互关联值 ,并且生成“汉英 (英汉 )相互关联值表” ,对于每个源语词汇选取相互关联值最大的若干项目标语作为候选词对 ,分别赋予信任值 1,然后统计每个候选词对的信任值作为翻译词典的分级标准 ,得到 4个不同级别的词典 ,其中“过滤词典 +4级词典”在召回率为 93 5 %的情况下 ,正确率达到 93 389% . This paper described an algorithm for automatic construction of English-Chinese translation lexicon from sentence aligned parallel spoken language corpus. The first part of the translation lexicon is get by using the electronic dictionary to filter the corpus. Secondly, authors count the co-occurrence probability and calculate the association score of the word pairs to produce The Table of Chinese-English (English-Chinese) Words Co-occurrence Association Score. Then, for each word pairs in the four tables, give 1 as the confidence score if the word pair's co-occurrence association score is the top five for each source word. Then, use the confidence score as the criterion for constructing 4 levels translation lexicons. The filtered lexicon and the 4th level lexicon get the precision of 93.389% and the recall of 93.5%. This is an inspiring result, because it is based on the Indo-European and the non-Indo-European spoken language corpus. In this algorithm, the grading of the lexicon can deduce effectively the number of the incorrect entries in the high level lexicon, which makes the translation lexicon more practicable, and solves the problem of the balance of the precision and recall.

作者陈博兴杜利民

机构地区中国科学院声学研究所语音交互技术研究中心

出处《计算机学报》 EI CSCD 北大核心 2003年第3期275-280,共6页 Chinese Journal of Computers

关键词双语对齐口语语料翻译词典自动生成关联值机器翻译英汉翻译语料库 Database systems

分类号 TP391.2 [自动化与计算机技术—计算机应用技术] H315.9 [语言文字—英语]

引文网络
相关文献

参考文献16

1Gale W, Church K. A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991. 177～184
2Brown P, Lai J,Mercer R. Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991. 169～176
3Simard M, Foster G, Isabelle P. Using cognates to align sentences in parallel corpora. In: Proceedings of the 4th Conference on Theoretical and Methodological Issues in machine Translation (TMI-92), Montreal, Canada, 1992. 67～81
4Church K. Char-align: A program for aligning parallel texts at the character level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993. 1～8
5Wu D. Aligning a parallel english-chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994. 80～87
6Fung P. Pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33th Annual Meeting of tha Association for Computational Linguistics, Boston, USA. 1995. 226～233
7Kumano A, Hirakawa H. Building an MT dictionary from parallel texts based on linguistic and statistical information. In: Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994. 76～81
8Wu D, Xia X. Large-scale automatic extraction of an English-Chinese translation lexicon. Machine Translation,1995, 9(4):285～313
9Hiemstra D. Using statistical methods to create a bilingual dictionary[M S dissertation].University of Twente,Netherlands, 1996
10Smadja F, Mckeown K, Hatzivassiloglou V. Translating collocations for bilingual lexicon: A statistical approach. Computational Linguistics, 1996, 22(1):3～38

共引文献2

1陈博兴,杜利民.基于“相同与差异”的机译单元的自动提取研究[J].中文信息学报,2003,17(3):34-40.
2程洁,杜利民.EBMT系统中的多词单元翻译词典获取研究[J].中文信息学报,2004,18(1):55-61. 被引量：5

同被引文献13

1郭建中.汉译英的翻译单位问题[J].外国语,2001,24(6):49-56. 被引量：69
2曾泰元.语料库与汉英词典编纂[J].辞书研究,2005(1):79-87. 被引量：12
3张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量：24
4王挺,陈火旺,杨谊,史晓东.一种自适应词性标注方法[J].软件学报,1997,8(12):937-943. 被引量：8
5刘昕,周明,朱胜火,黄昌宁.基于自动抽取词汇信息的双语句子对齐[J].计算机学报,1998,21(S1):151-158. 被引量：18
6王斌,刘群,张祥.汉英双语库自动分段对齐研究[J].软件学报,2000,11(11):1547-1553. 被引量：13
7钱丽萍,赵铁军,杨沫昀,高光来.基于译文的英汉双语句子自动对齐[J].计算机工程与应用,2000,36(12):59-61. 被引量：12
8吕学强,李清隐,任飞亮,姚天顺.基于统计的汉英法律文献亚句子级对齐[J].东北大学学报（自然科学版）,2003,24(1):23-26. 被引量：7
9吕雅娟,李生,赵铁军.基于双语模型的汉语句法分析知识自动获取[J].计算机学报,2003,26(1):32-38. 被引量：6
10常宝宝.基于统计的翻译等价词对抽取研究[J].计算机学报,2003,26(5):616-621. 被引量：11

引证文献2

1李德俊.基于英汉平行语料库的词典编写系统CpsDict的研制[J].现代外语,2006,29(4):371-381. 被引量：14
2史树敏.机器翻译方法的研究现状[J].内蒙古师范大学学报（自然科学汉文版）,2004,33(2):165-169. 被引量：6

二级引证文献20

1陆军,张乐.语料库语言学发展新动态——语料库语言学发展战略研讨会综述[J].当代外语研究,2010(2):32-35. 被引量：1
2王利众,于水.俄汉机器翻译:历史、任务与展望[J].中国俄语教学,2006,25(4):49-52. 被引量：1
3李党林,王永成,刘传汉.基于弱化语法规则的机器翻译方法研究[J].计算机仿真,2006,23(12):323-326. 被引量：1
4李德俊.完全对等、零对等的考察与汉英双语词典研编——基于平行语料库的研究[J].辞书研究,2009(2):55-66. 被引量：10
5韩美竹,辛鑫.语料库与英语阅读课中的词汇教学[J].西安外国语大学学报,2009,17(4):105-107. 被引量：11
6孙冰山.平行语料库:翻译实践的重要工具[J].长治学院学报,2010,27(4):24-26. 被引量：4
7李德俊.英汉语上下义关系词对比研究初探[J].外国语,2011,34(1):58-67. 被引量：6
8薛峰,黄新艳.基于图书馆语料库的英汉双语图书Ontology的构建[J].山东科技大学学报（社会科学版）,2011,13(2):91-95.
9熊兵,谢家成.应用文汉英双语平行语料库研制与应用[J].长江大学学报（社会科学版）,2012,35(2):75-78. 被引量：5
10何咏梅.基于语料库的翻译研究方法述评[J].河北联合大学学报（社会科学版）,2012,12(5):168-170. 被引量：3

1豆豆.冲破语言的障碍免费在线翻译立大功[J].电脑爱好者（普及版）,2008,0(7):37-38.
2程洁,杜利民.EBMT系统中的多词单元翻译词典获取研究[J].中文信息学报,2004,18(1):55-61. 被引量：5
3王亚娟.面向机器翻译的汉维词语对齐规范研究[J].电脑知识与技术（过刊）,2015,21(7X):199-201.
4外语通——6款词典翻译类软件评测[J].大众软件,2009(16):32-39.
5艾薇,刘峥.基于灰关联的红外目标识别方法[J].激光与红外,2008,38(6):609-611. 被引量：1
6刘黄梅.论语料库在改善中国旅游资料翻译可接受性差方面的创新性作用[J].北方文学（中）,2012(2):81-82.
7杨兴平.汉语辞典好帮手[J].微电脑世界,2010(3):100-100.
8何冰,刘明远.以анти-开头的派生词分析——以塔·尼·布茨娃的《新词释义词典》为例[J].明日风尚,2016,0(14):339-339.
9贺卫方.纳沙泰尔与“正义之泉”[J].城市地理,2009(3):10-10.
10飞雪散花.词霸助力外语写作也要妙笔生花[J].电脑迷,2009(22):67-67.

计算机学报

2003年第3期

浏览历史

内容加载中请稍等...

基于双语对齐口语语料的翻译词典的自动生成被引量：2

参考文献16

共引文献2

同被引文献13

引证文献2

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于双语对齐口语语料的翻译词典的自动生成 被引量：2

参考文献16

共引文献2

同被引文献13

引证文献2

二级引证文献20

相关作者

相关机构

相关主题

浏览历史

基于双语对齐口语语料的翻译词典的自动生成被引量：2