摘要
讨论了词类搭配的特点和形式描述问题·提出了一种机器翻译系统中词类搭配规则的自动获取方法,称为CRAM·该方法利用词类的相关性并引入机器学习技术,构造二叉树形式的分类决策树,能够从带有词类和语义标注的语料中获取用于词类有序消岐的搭配规则,与汉英机译系统(CETRAN)的结合应用表明了此方法的有效性·
The part of speech collocation characteristics and formalization problem are discuseeed. An automatic acquisition method of part of speech collocation rules in the machine translation,named as CRAM is presented. In this method,the coherence of the part of speech between words is used and the machine leaning technology is introduced to build the classification decision tree of the binary tree form. By means of this method,the collocation rules which are used to eliminate the part of speech order ambiguity can be acquired from the corpus which contains the part of speech and semantic tag. An application in combination with the machine translation system (CETRAN) showed that this method is effective.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
1999年第2期140-143,共4页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金