摘要
监督统计句法分析器的性能很大程度依赖于昂贵而有限的人工标注数据。为充分利用现有标注树库而不需额外设计句法分析器,该文提出了一种混合句法处理管线。该管线以基于最大生成树算法和线性链式条件随机场的句法分析器为基本框架,融合使用不同树库进行混合训练,综合利用不同树库对应的基线分析器解析的依存骨架,提取交叉信息,并在基本框架上构建了综合句法分析器。实验结果表明,该方法可以有效地提升单一树库的句法分析器的分析精度。
The mainstream dependency parser is a supervised statistical parser whose performance greatly relies on manually annotated dataset in recently. In order to use multi-treebank without building a new parser, a hybrid dependency processing pipeline is proposed. The pipeline is implemented through maximum spanning tree (MST) algorithm and linear chain conditional random fields (CRF) as base framework, and a hybrid dependency processing pipeline for training the parser by using multi-treebank is constructed, then a composite dependency parser is built from base framework to utilizes cross information of the multi-treebank with a set of hybrid feature templates. The result shows that the pipeline can improve the parsing precision of single-treebank parser without designing a new parser.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2016年第1期102-106,150,共6页
Journal of University of Electronic Science and Technology of China