期刊文献+

汉藏短语抽取 被引量:5

Chinese Tibetan Phrase Extraction
下载PDF
导出
摘要 该文将从汉藏法律法规和公文领域平行语料中提取双语短语对。考虑现阶段藏文资源匮乏,提出两步汉藏短语抽取方法。第一步是提取汉语有效语块,这部分工作不是该文工作重点。第二步是获取待翻译汉语短语的译文,该模块提出藏文词序列相交算法抽取藏文短语。该算法可以很好的抽取1-1和1-n连续和非连续藏文短语。 This paper describes a method to extract phrase pairs from domain-specific Chinese-Tibetan bilingual corpus of laws,regulations and official documents.So far,widely used phrase extraction methods heavily depend on the result of word alignment or additional resources like part-of-speech or syntactic analysis and so forth.Taking account of inadequate resources in Tibetan at present,this paper proposes a two-phase Chinese-Tibetan phrase pairs extraction method.The first step is to extract the Chinese phrase(multi-word chunk) using Nagao's Algorithm and Substring Reduction Algorithm.The second step is to extract the candidate Tibetan translation for translation-ready Chinese phrase.This paper proposes Tibetan words sequence intersection algorithm(TIA) to extract Tibetan phrase.TIA works well on both 1-1 translation and 1-n translation(either continuous or discontinuous) Tibetan phrase.
出处 《中文信息学报》 CSCD 北大核心 2011年第2期105-110,121,共7页 Journal of Chinese Information Processing
基金 中国科学院"西部行动计划高新技术项目"资助(KGCX2-YW-512)
关键词 汉藏短语抽取 藏文信息处理 中文信息处理 Chinese Tibetan phrase extraction Tibetan information processing Chinese information processing
  • 相关文献

参考文献11

  • 1Daniel Marcu, William Wong. A Phrase-based, Joint Probability Model for Statistical Machine Translation [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia,PA,USA. July 2002 : 133-139.
  • 2Dekai wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J]. Computational Linguistics, 1997, 23(3) : 377-404.
  • 3Ying Zhang, Stephan Vogel, Alex Waibel. Integrated phrase segmentation and alignment algorithm for statistical machine translation[C]//Proeeeding of International Conference on Natural Language Proeessing and Knowledge Engineering. Beijing,2003: 567-573.
  • 4Ying Zhang,Stephan Vogel. Competitive Grouping in Integrated Phrase Segmentation and Alignment Model [C]//Proeeeding of ACL Workshop On Building and Using Parallel Texts. Ann Arbor,2005:159-162.
  • 5H Kaji, Y Kida, Y Morimoto. Learning Translation Templates from Bili.ngual Texts[C]//Proceedings of the 14^th International Conference on Computational Linguistics. Nantes France, 1992:672-678.
  • 6Franz Josef Och, Hermann Ney. The alignment template approach to statistical machine translation [J]. Computational Linguistics,2004,30(4) : 417-449.
  • 7David Chiang. A Hierarchical Phrase-Based Model for Statistical Machine Translation [C]//Proceedings of the 43^th Annual Meeting of the Association for Computational Linguistics. Arbor,2005 :263-270.
  • 8何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5):91-95. 被引量:6
  • 9王辰,宋国龙,吴宏林,张俐,刘绍明.基于序列相交的短语译文获取[J].中文信息学报,2009,23(1):38-43. 被引量:3
  • 10Xueqiang Lv, Le Zhang, and Junfeng Hu. Statistical Substring Reduction in Linear Time[C]//Proceedings of LICNLP-2004,2004: 320-327.

二级参考文献18

  • 1Daniel Marcu, William Wong. A Phrase-based, Joint Probability Model for Statistical Machine Translation [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia, PA, USA. July 2002.
  • 2Dekai WU. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J]. Computational Linguistics, 1997, 23(3): 377-404.
  • 3Ying Zhang, Stephan Vogel, Alex Waibel. Integrated phrase segmentation and alignment algorithm for statistical machine translation [ C ]//Proceedingof International Conference on Natural Language Processing and Knowledge Engineering. Beijing, 2003.
  • 4Ying Zhang, Stephan Vogel. Competitive Grouping in Integrated Phrase Segmentation and Alignment Model [C]//Proceeding of ACL Workshop on Building and Using Parallel Texts. Ann Arbor. 2005: 159-162.
  • 5H Kaji, Y Kida, Y Morimoto. Learning Translation Templates from Bilingual Texts [C]//Proceedings of the 14th International Conference on Computational Linguistics. Nantes France. 1992: 672-678.
  • 6Fram Josef Och, Hermann Ney. The alignment template approach to statistical machine translation [J]. Computational Linguistics, 2004, 30(40): 417- 449.
  • 7何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5):91-95. 被引量:6
  • 8Daniel Marcu and William Wong.A Phrase-based,Joint Probability Model for Statistical Machine Translation[A].In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)[C].Philadelphia,PA,USA.July 2002.
  • 9Dekai WU.Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J].Computational Linguistics 1997.23(3):377-404.
  • 10Ying Zhang,Stephan Vogel,and Alex Waibel.Integrated phrase segmentation and alignment algorithm for statistical machine translation[A].In:Proceeding of International Conference on Natural Language Processing and Knowledge Engineering[C].Beijing:2003.

共引文献6

同被引文献62

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部