摘要
报告了依托宾州中文树库进行句法分析研究的最新进展。以著名的中心驱动模型为基础,首次在宾州中文树库5.0上进行了句法分析实验。同前人的工作相比,这次实验取得了更加成功的结果,极大缩小了中、英文句法分析的差距。在公共的测试集上对句法分析器的性能进行了评价,对于正确分词和词性标注的句子,句法分析的精确率和召回率分别达到85.89%和85.61%。介绍了模型的实现过程,并进一步分析了模型中决策表和基本名词短语(BNP)两个关键环节在句法分析器中所起到的作用。本文的工作对于研制实用化句法分析系统具有一定参考价值。
This paper reports the new improvement of the work on parsing the Penn Chinese treebank (CTB), one of the most important technologies of Chinese information processing. The well-known head,driven model was applied to the new available CTB5.0 and the parsing experiment was performed for the first time. Compared with the previous work on CTB, the experiment achieved more promising result and greatly narrowed the performance gap between Chinese parsing and English parsing. The parser was evaluated on the standard test set with PARSEVAL metric. It performed with the precision of 85.89% and the recall rate of 85.61% on the sentences with gold segmentation and POS tagging. The construction of the parser was described, and the functions of the two important technologies that can significantly improve the parsing performance were analyzed. This work is referential to the development of Chinese parser for real applications.
出处
《高技术通讯》
CAS
CSCD
北大核心
2007年第1期15-20,共6页
Chinese High Technology Letters
基金
国家自然科学基金(60302021、60375019)和863计划(2004AA117010-08)资助项目.
关键词
中心驱动模型
宾州中文树库
句法分析
结构模式识别
head-driven model, Penn Chinese treebank, parsing, syntactic pattern recognition