期刊文献+

汉英双语语料库中名词短语的自动对应 被引量:7

Noun Phrase Alignment in Chinese-English Bilingual Corpora
下载PDF
导出
摘要 本文提出了一种在汉英双语语料库句子对齐的基础上,自动进行汉英名词短语划分和对应的方法。该方法的主要特点在于在无需严格识别汉语名词短语的情况下,对高频短语和低频短语分别进行处理,对于高频短语,利用英语短语和汉语词在双语语料库中的关联信息,采用一种迭代重估算法进行双语短语的对应;对于低频短语,根据双语词典中源词和译词之间的对应信息,结合一套人工编写的句法规则进行双语低频短语的对应。该方法能够从整体上把握对应信息,并具有很高的覆盖率。 In this paper, a method is proposed to align bilingual noun phrases automatically in sentencealigned ChineseEnglish bilingual corpus. The characteristic of our method is to deal with highfrequency noun phrases and lowfrequency noun phrases separately without recognizing Chinese noun phrase accurately. Highfrequency noun phrases in English corpus are aligned to those in Chinese corpus using an iterative reevaluation algorithm according to the cooccurrence between English phrases and Chinese words in bilingual corpora; Lowfrequency noun phrases are aligned using the manual rules and Dice coefficient which is based on EnglishChinese dictionary. This method can take into account the alignment information on the whole, and acquire the result with high coverage rate.
出处 《中文信息学报》 CSCD 北大核心 2003年第5期6-12,共7页 Journal of Chinese Information Processing
基金 国家973项目(G199803050IA-06 G199803050IA-04)
关键词 人工智能 机器翻译 名词短语识别 短语对齐 迭代重估 相似度 artificial intelligence machine translation noun phrase recognition phrase alignment iterative re-evaluation similarity
  • 相关文献

参考文献9

  • 1赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-7. 被引量:41
  • 2周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别[J].软件学报,2000,11(2):195-201. 被引量:37
  • 3Xun E, ghou M, and Huang C. A Unified Statistical Modal for the Identification of English Base NP.The 38th Annual Meeting of the Association for Computational Linguistics [C], 2002.
  • 4Lance A. Ramshaw and Mitchell P. Marcus. Text Chunking Using Transformation-Based Learning.Proceedings of the Third ACL Workshop on Very Large Corpora [C], Cambridge MA, USA, 1995.
  • 5Jlian M. Kupiec. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. Proceedings of the 3Ist Annual Meeting of the ACL [ C] ,1993.
  • 6Smadja F, McKeown K. R and Hatzivassiloglou V. Translation Collocations for Bilingual Lexicons: A Statistical Approach [J] Computational Linguistics 1996,22(1) : 1 - 38.
  • 7Melamed I. D. Automatic Discovery of Non-Compositional Compounds. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing [C], Providence, RI 1997.
  • 8Jianfeng Gao, Jian-Yun Nie. Improving Query Translation for Cross-language Information Retrieval Using Statistical Models Proceedings of the 24th annual international ACMSIGIR conference [C] 96 - 104,2001.
  • 9周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35

二级参考文献12

  • 1周明,黄昌宁.面向语料库标注的汉语依存体系的探讨[J].中文信息学报,1994,8(3):35-52. 被引量:39
  • 2张卫国.三种定语、三类意义及三个槽位[J].中国人民大学学报,1996,(4):97-100.
  • 3周强,计算机研究与运用,1993年
  • 4李子云,汉语句法规则,1992年
  • 5房玉清,实用汉语语法,1992年
  • 6吴竞存,现代汉语句法结构与分析,1992年
  • 7范晓,汉语的短语,1991年
  • 8团体著者,世界汉语教学,1989年,1期
  • 9朱德熙,语法答问,1985年
  • 10张卫国,中国人民大学学报,1996年,4期,97页

共引文献101

同被引文献112

引证文献7

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部