期刊文献+

多策略汉维句子对齐 被引量:8

Chinese-Uyhur Sentence Alignment Based on Hybrid Strategy
下载PDF
导出
摘要 提出了一种错误抑制的多策略算法对齐汉维语句子。针对长度对齐算法无法避免错误蔓延的特点,提出了一种新的错误蔓延抑制策略:利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,结合句子长度特征,寻找1:1模式的句对作为锚点,将错误蔓延抑制在锚点内;在锚点之间,利用标点符号和长度混合方法进行句子对齐。算法实验结果验证了该多策略算法寻找的锚点的精度高,有效抑制了对齐错误的蔓延;采用的混合对齐算法,避免了基于词汇对齐算法的高时间复杂度的弱点,比传统的对齐算法性能有了较大提高,对齐准确率由95.0%提高到97.6%,召回率由96.8%提高到98.2%,采用的对齐正确性评价算法可以有效发现自动对齐中的噪音对齐。 This paper proposed a hybrid algorithm of sentence alignment in Chinese-Uyhur parallel corpora. Aiming at the shortcoming of mistake spread in alignment algorithm based on length, this paper presented a new kind of suppression strategy for mistake spread. By using csentence length and Chinese-Uyhur correspondence information, the anchor points with 1:1 pattern sentence pairs are identify to suppress mistakes spread. Among anchor points,a approach based on both length and punctuation is used to align sentences. Experimental results verify the high precision of identifying anchor points and the effective restraint of the spread of mistakes; Hybrid alignmentd algorithm avoids the weakness of high time complexity algorithms based on words. In addition, its performance is improved more compare with traditional alignment algorithms, and increase alignment aecuarey from 95.0 % to 97. 6 % and recall from 96. 8 % to 98. 2%, and. the validity evaluation method can find the noised alignment efficently.
出处 《计算机科学》 CSCD 北大核心 2010年第4期215-218,292,共5页 Computer Science
基金 国家自然科学基金项目(60663006 60963017) 新疆维吾尔自治区高等学校科学研究计划(XJEDU2009I05)资助
关键词 双语语料 错误抑制 句子对齐 混合策略 汉维句子 Bilingual corpora,Error curb,Hybrid strategy,Sentence alignment,Chinese-Uyhur sentence
  • 相关文献

参考文献12

  • 1Dolan W B,Pinkham J,Richardson S D.The Microsoft Research Machine Translation System[J].AMTA,2002:237-239.
  • 2Wu D,Xia X.Large-scale automatic extraction of an English-Chinese translation lexicon[J].Machine Translation,1995,9(3/4):285-313.
  • 3Fattah M A,Ren Fuji,Shingo K.Adaptive Threshold Parameters for Bilingual Dictionary Extraction from the Internet Archive[J].International Journa Information,2005,8(1):165-175.
  • 4Dejean H,Gaussier E,Sadat F.Bilingual Terminology Extraction:An Approach based on a Multilingual thesaurus Applicable to Comparable Corpora[C]//Proceedings of the 19th International Conference on Computational Linguistics COLING.Taipei,Taiwan,2002:218-224.
  • 5Chuang T C,Yeh K C.Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria[J].Computational Linguistics and Chinese Language Processing,2005,10(1):95-122.
  • 6Brown P F,Lai J C,Mercer R L.Aligning sentences in parallel corpora[A]//Proceedings of 29th Annual Meetingof the Association for Computational Linguistics Berkeley[C].CA:ACL,1991:169-176.
  • 7Gale W A,Church K W.A program for aligning sentences in bilingual corpora[J].Computational Linguistics,1993,19(1):75-102.
  • 8Kay M.Roscheisen M.Text-translation alignment[J].Computational Linguistics,1993,19(1):121-142.
  • 9Wu D.Aligning a parallel English-Chinese corpus statistically with lexical criteria[A]//Proceedings of the 32th Annual Conference of the Association for Computational Linguistics.Las Cruces[C].NM:ACL,1994:80-87.
  • 10张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24

二级参考文献9

  • 1BROWN P,LAI J,MERCER R.1991.Aligning Sentences in Parallel Corpora[A].ACL-91[C].1991.
  • 2WU,Dekai.Aligning a parallel English -Chinese corpus statistically with lexical criteria[A].In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics[C].1994,80-87,Las Cruces,New Mexico.
  • 3GALE W A,CHURCH K W.A Program for Aligning Sentences in Bilingual Corpora[J].Computational Linguistics,1993,19(2):75-102.
  • 4Church,Kenneth W.Char_ align:A Program for Aligning Parallel Texts at the Character Level[A].Proceedings of ACL -93,Columbus OH[C].1993.
  • 5CHEN Stanley.Aligning Sentences in Bilingual Corpora Using Lexical Information[A].Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics[C].1993.
  • 6KAY M,ROSCHEISEN M.Text-Translation Alignment[A].Computational Linguistics[C].1993.
  • 7刘昕,周明,朱胜火,黄昌宁.基于自动抽取词汇信息的双语句子对齐[J].计算机学报,1998,21(S1):151-158. 被引量:17
  • 8王斌,刘群,张祥.汉英双语库自动分段对齐研究[J].软件学报,2000,11(11):1547-1553. 被引量:13
  • 9吕学强,李清隐,陈文亮,姚天顺.汉英法律文献的子条级自动索引和对齐[J].中文信息学报,2002,16(4):52-59. 被引量:2

共引文献37

同被引文献55

引证文献8

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部