摘要
本文提出了一种在汉英双语语料库句子对齐的基础上,自动进行汉英名词短语划分和对应的方法。该方法的主要特点在于在无需严格识别汉语名词短语的情况下,对高频短语和低频短语分别进行处理,对于高频短语,利用英语短语和汉语词在双语语料库中的关联信息,采用一种迭代重估算法进行双语短语的对应;对于低频短语,根据双语词典中源词和译词之间的对应信息,结合一套人工编写的句法规则进行双语低频短语的对应。该方法能够从整体上把握对应信息,并具有很高的覆盖率。
In this paper, a method is proposed to align bilingual noun phrases automatically in sentencealigned ChineseEnglish bilingual corpus. The characteristic of our method is to deal with highfrequency noun phrases and lowfrequency noun phrases separately without recognizing Chinese noun phrase accurately. Highfrequency noun phrases in English corpus are aligned to those in Chinese corpus using an iterative reevaluation algorithm according to the cooccurrence between English phrases and Chinese words in bilingual corpora; Lowfrequency noun phrases are aligned using the manual rules and Dice coefficient which is based on EnglishChinese dictionary. This method can take into account the alignment information on the whole, and acquire the result with high coverage rate.
出处
《中文信息学报》
CSCD
北大核心
2003年第5期6-12,共7页
Journal of Chinese Information Processing
基金
国家973项目(G199803050IA-06
G199803050IA-04)
关键词
人工智能
机器翻译
名词短语识别
短语对齐
迭代重估
相似度
artificial intelligence
machine translation
noun phrase recognition
phrase alignment
iterative re-evaluation
similarity