摘要
实现一个基于历史信息的多层次中文句法分析系统。采用最大熵模型进行参数学习,在每层处理过程中,优先识别出容易识别的组块,在此基础上根据更丰富的上下文信息循环进行复杂组块的识别,直至识别出根结点。通过采用给出的相关算法,实验结果表明,在宾州中文树库测试集Section271-300上得到的F值性能为83.76%(<=40 words)和80.02%(<=100 words)。
This paper proposed a History-based hierarchical Chinese parser. The maximum entropy model is applied for learning parameters. In each level's process,simple constitutes would be detected firstly, so that the complex ones are to be recognized reliably with richer contextual information circulation in the following process until root node is identified. In this paper,the pertinent algorithm of the system is given, and evaluation on the Penn Chinese Treebank Section 271 - 230 ( based on gold standard segmentation) shows that this parser achieves the state-of-art performance with F-Measure 83.76 ( 〈 = 40 words) and 80.02 ( 〈 = 100 words).
出处
《计算机应用与软件》
CSCD
2009年第6期45-47,51,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60673041)
国家高技术研究发展计划(2006AA01Z147)
关键词
中文句法分析
历史信息
层次分析
最大熵模型
Chinese parsing History information Hierarchical parsing Maximum entropy models