摘要
本文提出了一种汉语概率型上下文无关语法(PCFG)的自动推导方法,它在匹配分析机制上实现了无指导的EM迭代训练算法,并通过对训练语料的自动短语界定预处理以及在集成不同知识源基础上构造合适的初始规则集,保证了训练算法能迅速收敛于符合语言事实的规则概率分布状态.初步的实验结果显示出目前的算法在训练效率和训练结果可信度方面都是令人满意的.
This paper proposes a new inference approach for Chinese probabilisticcontext-free grammar, which implements the EM algorithm based on the bracketmatching schemes. Two characteristics of the algorithm are as follows: 1) To pre-process the training texts with automatic constituent boundary prediction tools,which can provide stronger syntactic restriction upon training texts in lower compu-tational costs; 2) To develop an initial rule set by integrating different knowledgeresources, including a set of basic syntactic rules generated by an automatic gram-mar construction t00l and a set of special rules summarized by linguists or extractedfrom treebanks, and provide a better initialization for the learning process. There-fore, a linguistically-motivated and broad-coverage Chinese PCFG rule set can beeasily generated through this algorithm. Current experimental results prove goodlearning efficiency of this algorithm and high reliability of the generated rule set.
出处
《计算机学报》
EI
CSCD
北大核心
1998年第5期385-392,共8页
Chinese Journal of Computers
基金
国家自然科学重点基金
中国博士后科学基金
关键词
语法推导
PCFG
语料库语言学
语言信息处理
Probabilistic context-free grammar, expectation-maximization algorithm, grammar inference