期刊文献+

基于锚点句对的汉维句子对齐方法 被引量:5

Chinese-Uyghur Sentence Alignment Method Based on Anchor Sentence Pairs
下载PDF
导出
摘要 为提高汉维句子对齐方法的准确率,提出一种分段句子对齐方法。采用词汇信息和长度信息相结合的策略,识别出能作为锚点的一对句子(锚点句对),并将其作为分割标志对全文进行分段,在各片段内使用基于长度的方法实现全部句子的对齐,采用词汇、数字、标点符号和长度信息提高方法的领域移植性,使用分段方法避免复杂的计算过程,从而解决错误蔓延问题。实验结果表明,该方法的准确率达到95.2%,比基于长度的句子对齐方法提高了2.7%。 The step-by-step sentence alignment method is introduced in order to improve current Chinese-Uyghur sentence alignment method. Lexical and length information is used to generate some anchor sentences. Texts are divided into several sections by using anchor sentence as boundary,and then sentences in each section are aligned using lengthbased method. This method is effective in multi domain text because it uses w ords,numbers,and punctuation marks. It avoids complex computing and error spreading because of its "subsection"technique. Experimental results show that the precision of this method is 95. 2% in Chinese-Uyghur multi-domain texts,w hich is 2. 7% higher than length-based method.
出处 《计算机工程》 CAS CSCD 北大核心 2015年第4期166-170,共5页 Computer Engineering
基金 新疆维吾尔自治区自然科学基金资助项目(2012211B08)
关键词 平行语料库 句子对齐 锚点 基于长度的方法 基于词汇的方法 parallel corpora sentence alignment anchor length-based method lexical-based method
  • 相关文献

参考文献8

  • 1田生伟,吐尔根.依布拉音,禹龙,加米拉.吾守尔,杨飞宇.多策略汉维句子对齐[J].计算机科学,2010,37(4):215-218. 被引量:8
  • 2Gale W ,Church K. A Program for Aligning Sentences in Bilingual Corpora[C ]//Proceedings of the 29th Annual Meeting of ACL. Stroudsburg, USA Association for Computational Linguistics, 1991 : 177-184.
  • 3Brown P F,Mercer R L. Aligning Sentences in Parallel Corpora[ C]//Proceedings of the 29th Annual Meeting of ACL. Stroudsburg, USA: Association for Computa- tional Linguistics, 1991 : 169-176.
  • 4Gale W,Church K. A Program for Aligning Sentences in Bilingual Corpora[ J]. Computational Linguistics, 1993, 19(1) :75-90.
  • 5Mamitimin S. Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences [ C] //Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Non-parallel Corpora. Singapore: Association for Computational Linguistics, 2009:38-45.
  • 6塞麦提·麦麦提敏.汉维平行语料库构建研究[D].北京:中国传媒大学,2009.
  • 7李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量:25
  • 8张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24

二级参考文献21

  • 1张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24
  • 2李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量:25
  • 3Dolan W B,Pinkham J,Richardson S D.The Microsoft Research Machine Translation System[J].AMTA,2002:237-239.
  • 4Wu D,Xia X.Large-scale automatic extraction of an English-Chinese translation lexicon[J].Machine Translation,1995,9(3/4):285-313.
  • 5Fattah M A,Ren Fuji,Shingo K.Adaptive Threshold Parameters for Bilingual Dictionary Extraction from the Internet Archive[J].International Journa Information,2005,8(1):165-175.
  • 6Dejean H,Gaussier E,Sadat F.Bilingual Terminology Extraction:An Approach based on a Multilingual thesaurus Applicable to Comparable Corpora[C]//Proceedings of the 19th International Conference on Computational Linguistics COLING.Taipei,Taiwan,2002:218-224.
  • 7Chuang T C,Yeh K C.Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria[J].Computational Linguistics and Chinese Language Processing,2005,10(1):95-122.
  • 8Brown P F,Lai J C,Mercer R L.Aligning sentences in parallel corpora[A]//Proceedings of 29th Annual Meetingof the Association for Computational Linguistics Berkeley[C].CA:ACL,1991:169-176.
  • 9Gale W A,Church K W.A program for aligning sentences in bilingual corpora[J].Computational Linguistics,1993,19(1):75-102.
  • 10Kay M.Roscheisen M.Text-translation alignment[J].Computational Linguistics,1993,19(1):121-142.

共引文献39

同被引文献63

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部