期刊文献+

英中可比语料库中多词表达自动提取与对齐 被引量:12

Automatic extraction and alignment of multiword expressions from English-Chinese comparable corpus
下载PDF
导出
摘要 多词表达(MWE)不仅用来提高当前机器翻译系统质量,而且也用于跨语言检索和数据挖掘等其他自然语言处理领域。为此,提出了基于语义模板与基于统计工具相结合的方法从三元组可比语料库中自动提取本族英语MWE。采用基于词表和分布方法计算词语间的相似度,扩大MWE覆盖范围。利用GIZA++对齐算法提取对译的中文MWE,依据统计方法计算互译概率信息,根据概率大小,选择最佳英汉MWE互译对。实验结果表明上述方法可以有效提高MWE提取和对齐的准确率。 Multiword Expressions(MWE) are important for practical applications, such as machine translation(henceforth, MT) ,multilingual information retrieval,data mining and other natural language processing.A method of combining semantic template and statistical tool is proposed for automatically extracting native English MWE from three-tuple comparable corpus. Thesaurus-based and distributional methods are harnessed to calculate the semantic relations between words for improving MWE coverage.GIZA++ is executed to align words at sentence level, aiming at obtaining Chinese MWE candidates.For each native English MWE, all of the Chinese MWE candidates are collected and sorted according to their co-occurrence affinity. Only the top one is accepted as true Chinese translation of the given English MWE.Experimental results show the proposed technique improves MWE extraction and alignment efficiently.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第31期130-134,187,共6页 Computer Engineering and Applications
基金 国家自然科学基金No.60872118~~
关键词 三元组可比语料库 多词表达(MwE) 语义模板 three-tuple comparable corpus multiword expressions(MWE) semantic template
  • 相关文献

参考文献26

  • 1Wakaki H, Fujii H, Suzuki M, et al.Abbreviation generation for Japanese multi-word expressions[C]//Proceedings of the Workshop on Multiword Expressions:Identification,Interpretation,Disambiguation, Applications, 2009: 73-80.
  • 2de Medeiros Caseli H,Villavicencio A,Machado A, et al.Statistically-driven alignment-based multiword expression identification for technical domains[C]//Proeeedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation, Applications, 2009:1-8.
  • 3Ren Zhixiang,Lu Yajuan, Cao Jie, et al.Improving statistical machine translation using domain bilingual multiword expressions[C]// Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation, Applications, 2009: 47-54.
  • 4Rayson P, Xiao Jian, Wong A, et al.Quantitative analysis of translation revision: contrastive corpus on native english and chinese translationese[C]//XVIII FIT World Congress, 2008, Shanghai, China, 2008.
  • 5Ramisch C, Schreincr P,Idiart M,et aLAn evaluation of methods for the extraction of multiword expressions[C]//Proccedings of the LREC Workshop Towards a Shared Task for Multiword Expressions, 2008: 50-53.
  • 6Van de Cruys T,Moir'on B V.Semantics-based multiword expression extraction[C]//Proeeedings of the Workshop on A Broader Perspective on Multiword Expressions,2007:25-32.
  • 7Rayson P.Falling foul of multiword expressions[C]//Proceedings of Lancaster University and CCID Joint Workshop on Chinese Multi-Word Expression(MWE) and Machine Translation, 2006: 8-40.
  • 8Piao S S L.MWE and translation[C]//Proceedings of Lancaster University and CCID Joint Workshop on Chinese Multi-Word Expression(MWE) and Machine Translation,2006:53-54.
  • 9Piao S S L, Sun Guangfan, Rayson P, et al.Automatic extraction of Chinese multiword expressions with a statistical tool[C]// Proceedings of the Workshop on Multi-word Expressions in a Multilingual Context,2006:17-24.
  • 10Katz G,Giesbrecht E.Automatic identification of non-compositional multi-word expressions using Latent Semantic Analysis[C]// Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (COLING/ACL' 06) ,2006:12-19.

共引文献104

同被引文献101

引证文献12

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部