摘要
文中运用浅层句法分析理论 ,把汉语句子分析划分为标注、组块、构造和检查三个过程 .并针对已有概率评价模型的特征类型少 ,不能充分利用上下文中对分析有用的信息等问题 ,提出了基于最大熵的概率评价模型来评估分析过程中每个行为的概率 .在该模型中 ,对分析有用的任何信息都可以成为模型中的一个特征 ;定义了静态模板结构的特征集和训练集 ,给出了相应的特征选择策略和基于GIS的参数估计算法 ;采取BFS算法高效搜索概率值最高的候选句法树作为最终的句法分析结果 .实验结果表明 :该模型具有较高的分析效率和准确性 .
The shallow parsing theory is applied to partition Chinese sentence parsing into three procedures: TAG, CHUNK, BUILD and CHECK. To resolve the problem of lacking feature types for available probabilistic models and make the best of useful information for parsing in context, we present probabilistic model based on maximum entropy to evaluate the probability of each action in the parsing procedures. In this model, any useful information for parsing in a context could be an actual feature; the features and training events are defined; the strategy of feature selection and the algorithm of parameter estimation based on Generalized Iterative Scaling (GIS) are given; The final result of parsing is the parse tree with the largest probability searched with Breadth-first search (BFS). The model is experimentally proved satisfying in both parsing efficiency and precision.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2003年第11期1608-1612,共5页
Acta Electronica Sinica
基金
国家自然科学基金资助项目 (No .60 1 740 2 8)
关键词
自然语言处理
最大熵模型
组块
句法分析
广度优先搜索
Algorithms
Entropy
Mathematical models
Parameter estimation
Probability
Trees (mathematics)