期刊文献+

汉维哈柯双语语料库加工系统词对齐技术的研究 被引量:2

Study on Word Alignment Technology of Chinese-Uygur Kazak Kirgiz Bilingual Corpus Processing and Programming System
下载PDF
导出
摘要 在自然语言处理领域,以双语平行语料库为基础的应用日益增多,平行语料库的建设对于机器翻译、双语词典编纂、词义消歧和跨语言信息检索具有重要的价值。因此,设计了高效实用的汉维哈柯双语语料库加工系统。该系统将文档对齐、句子对齐以及词语对齐技术有机地融合为一体,具有高效、方便、快捷和可扩充等特点。 In the field of naturallanguage processing, applications based on mandarin-to-ethnic language parallel corpara have been increasing. The building of parallel corpara is of vital importance to machine translation, mandarin-to-ethnic language lecicography, disambiguity and interiangnage concordance. In the current paper, an efficient system for processing Mandarin-Uygur, Mandarin-Kazak and Mandarin- Kirghiz parallel corpa is established. The intergrated text alignment, sentence alignment arid word alignment in this system make it highly efficient, convenient and expandable.
作者 艾山·毛力尼亚孜 谭勋 吐尔根·依布拉音 艾山·吾买尔 AISHAN Molniyaz, TAN Xun, TURGUN Ibrahim, AISHAN Wumaier (College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China)
出处 《电脑知识与技术》 2011年第10期6895-6896,6925,共3页 Computer Knowledge and Technology
基金 电子信息产业发展基金维哈柯语言文字软件开发及产业化维哈柯文辅助翻译软件项目 新疆多语种信息技术重点实验室开放课题 新疆大学博士科研启动基金 国家大学生创新性实验计划项目(编号:101075523) 新疆维吾尔自治区自然科学基金(2011211807) 青年教师科研培育基金(XJEDU2010S07)
关键词 双语语料 平行语料库 词语对齐 bilingual corpora parallel corpus word alignment
  • 相关文献

参考文献11

  • 1Dolan W B,Pinkham J,Richardson S D.MSR-MT, the micro-soft research machine translation system [C]//LNCS 2499,AM-TA,2002: 237-239.
  • 2Wu D,Xia X.Large-scale automatic extraction of all English-Chi- nese translation lexicon [J].Machine Translation,1995,9 (3/4):285- 313.
  • 3Fattah M A,Ren F, Shingo K.Adaptive threshold parameters for bilingual dictionary extraction from the interact archive[J].International Journal Information,2005,8(1): 165-175.
  • 4Dejean H,Gaussier E,Sadat F.Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora[C]//Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002.Taipei,2002:218-224.
  • 5Chuang T C,Ych K C.Aligning parallel bilingual corpora staffstically with punctuation criteria[J].Computational Linguistics and Chinese Language Processing,2005,10(1):95-122.
  • 6钱丽萍,赵铁军,杨沫昀,高光来.基于译文的英汉双语句子自动对齐[J].计算机工程与应用,2000,36(12):59-61. 被引量:12
  • 7淑琴,那顺乌日图.面向EBMT系统的汉蒙双语语料库的构建[J].内蒙古社会科学,2006,27(1):140-144. 被引量:5
  • 8田生伟,吐尔根.依布拉音,禹龙.混合策略的汉维句子对齐[J].计算机工程与应用,2010,46(34):143-145. 被引量:3
  • 9Brown P F,Della Pietra V J,Della Pietra S A,et al.The mathematics of Statistical Machine Translation: Parameter Estimation[J].Compu- tational Linguistics,1993,19(2):263-311.
  • 10Church K W.Char align: A program for aligning parallel texts at the character level[C]//Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.Columbus,Ohio,1993:l-8.

二级参考文献23

  • 1张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24
  • 2李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量:25
  • 3刘昕 周明 等.基于长度算法的中-英双语文本对齐的试验[J].计算语言学进展与应用,1995,:62-68.
  • 4Dolan W B, Pinkham J, Richardson S D.MSR-MT, the microsoft research machine translation system[C]//LNCS 2499, AM- TA, 2002: 237-239.
  • 5Wu D, Xia X.Large-scale automatic extraction of an English-Chinese translation lexicon[J].Machine Translation, 1995, 9 (3/4) :285-313.
  • 6Fattah M A, Ren F, Shingo K.Adaptive threshold parameters for bilingual dictionary extraction from the intemet archive[J]. International Journal Information, 2005,8 ( 1 ) : 165-175.
  • 7Dejean H,Gaussier E, Sadat F.Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora[C]//Proceedings of the 19th International Conference on Computational Linguistics,COLING 2002,Taipei, Taiwan, 2002 : 218-224.
  • 8Chuang T C ,Yeh K C.Aligning parallel bilingual corpora statistically with punctuation criteria[J].Computational Linguistics and Chinese Language Processing, 2005,10 ( 1 ) : 95-122.
  • 9Brown P F, Lai J C, Mercer R L.Aligning sentences in parallel corpora[C]//Proceedings of 29th Annual Meeting of the Association for Computational Linguistics Berkeley.CA: ACL, 1991: 169-176.
  • 10Gale W A, Church K W.A program for aligning sentences in bilingual corpora[J].Computational Linguistics, 1993,19( 1 ) :75-102.

共引文献14

同被引文献16

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部