期刊文献+

Alignment of the Polish-English Parallel Text for a Statistical Machine "Translation

Alignment of the Polish-English Parallel Text for a Statistical Machine "Translation
下载PDF
导出
摘要 Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.
机构地区 Multimedia Department
出处 《Computer Technology and Application》 2013年第11期575-583,共9页 计算机技术与应用(英文版)
关键词 Text alignment NLP tools machine learning text corpora processing 机器翻译 文本对齐 波兰 英语 自然语言处理 统计 平行 句子对齐
  • 相关文献

参考文献31

  • 1S.J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed., Prentice Hall, 2010, pp. 907-910.
  • 2Y. Deng, S. Kumar, W. Byrne, Segmentation and alignment of parallel text for statistical machine translation, Natural Language Engineering 12 (4) (2006) 1-26.
  • 3S. Karimi, F. Scholer, A. Turpin, Machine transliteration survey, ACM Computing Surveys 43 (3) (2011) 6-46.
  • 4F. Braune, A. Fraser, Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora, in: Coling 2010: Poster Volume, 2010, pp. 81-89.
  • 5K. Marasek, TED Polish-to-English translation system for the IWSLT 2012, in: Proc. of International Workshop on Spoken Language Translation (IWSLT) 2010, Hong Kong 2012.
  • 6M. Cettolo, C. Girardi, M. Federico, Wit 3: Web inventory of transcribed and translated talks, in: Proc. of 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, 2012. pp. 261-268.
  • 7A. Santos, A survey on parallel corpora alignment, in: MI-STAR, 2011, pp. 117-128.
  • 8P.F. Brown, J.C. Lai, R.L. Mercer, Aligning sentences in parallel corpora, in: Proc. of 29th Annual Meeting of the ACL, Berkeley, 1991, pp. 169-176.
  • 9W.A. Gale, K.W. Church, Identifying word correspondences in parallel texts, in: Proc. of DARPA Workshop on Speech and Nual Language, 1991, pp. 152-157,.
  • 10D. Varga, P. Halacsy, A. Kornai, V. Nagy, L. Nemeth, et al., Parallel corpora for medium density languages, in: Proc. of the RANLP 2005, Borovets, Bulgaria, 2005, pp. 590-596.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部