基于中心驱动模型的宾州中文树库（CTB）句法分析被引量：3

Parsing Penn Chinese treebank （CTB） with head-driven model

下载PDF

导出

摘要报告了依托宾州中文树库进行句法分析研究的最新进展。以著名的中心驱动模型为基础，首次在宾州中文树库5．0上进行了句法分析实验。同前人的工作相比，这次实验取得了更加成功的结果，极大缩小了中、英文句法分析的差距。在公共的测试集上对句法分析器的性能进行了评价，对于正确分词和词性标注的句子，句法分析的精确率和召回率分别达到85．89％和85．61％。介绍了模型的实现过程，并进一步分析了模型中决策表和基本名词短语（BNP）两个关键环节在句法分析器中所起到的作用。本文的工作对于研制实用化句法分析系统具有一定参考价值。 This paper reports the new improvement of the work on parsing the Penn Chinese treebank （CTB）, one of the most important technologies of Chinese information processing. The well-known head,driven model was applied to the new available CTB5.0 and the parsing experiment was performed for the first time. Compared with the previous work on CTB, the experiment achieved more promising result and greatly narrowed the performance gap between Chinese parsing and English parsing. The parser was evaluated on the standard test set with PARSEVAL metric. It performed with the precision of 85.89% and the recall rate of 85.61% on the sentences with gold segmentation and POS tagging. The construction of the parser was described, and the functions of the two important technologies that can significantly improve the parsing performance were analyzed. This work is referential to the development of Chinese parser for real applications.

作者曹海龙赵铁军李生

机构地区哈尔滨工业大学语言语音教育部-微软重点实验室

出处《高技术通讯》 CAS CSCD 北大核心 2007年第1期15-20,共6页 Chinese High Technology Letters

基金国家自然科学基金（60302021、60375019）和863计划（2004AA117010-08）资助项目.

关键词中心驱动模型宾州中文树库句法分析结构模式识别 head-driven model, Penn Chinese treebank, parsing, syntactic pattern recognition

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Uszkoreit H,Flickinger D,Kasper W,et al.Deep linguistic analysis with HPSG.In:Verbmobil:Foundations of speechto-speech translation.Heidelberg:Springer,2000.216-237
2Zhou Q.A statistics-based Chinese parser.In:Proceedings of the 5th Workshop on Very Large Corpora.1997,4-15
3Zhou M.A block-based dependency parser for unrestricted Chinese text.In:Proceedings of the 2nd Chinese Language Processing Workshop.2000,78-84
4Zhang Y,Xu B,Zong C Q.Chinese syntactic parsing based on extended GLR parsing algorithm with PCFG *.In:Proceedings of the 19th International Conference on Computational Linguistics.2002,1308-1332
5Xue N W,Xia F,Chiou F D,et al.The Penn Chinese treebank:phrase structure annotation of a large corpus.Natural Language Engineering,2004,10(4):1-30
6Collins M.Head-driven statistical models for natural language parsing:[Ph.D.thesis].Pennsylvania:University of Pennsylvania,1999
7Xia F.Automatic grammar generation from two different perspective:[Ph.D.thesis].Pennsylvania:University of Pennsylvania,1999
8Bikel D,Chang D.Two statistical parsing models applied to Chinese treebank.In:Proceedings of the 2nd Chinese language processing workshop.Hong Kong,2000.1-6
9Chiang D,Bikel D.Recovering latent information in treebanks.In:Proceedings of the 19th International Conference on Computational Linguistics.2002,183-189
10Levy R,Manning C.Is it harder to parse Chinese,or the Chinese treebank? In:Proceedings of Association of Computational Linguistic.2003,439-446

同被引文献27

1党政法,周强.短语树到依存树的自动转换研究[J].中文信息学报,2005,19(3):21-27. 被引量：12
2冯志伟.自然语言处理中的概率语法[J].当代语言学,2005,7(2):166-178. 被引量：10
3冀铁亮,穗志方.词汇化句法分析与子语类框架获取的互动方法[J].中文信息学报,2007,21(1):120-126. 被引量：3
4周强.汉语语料库的短语自动划分和标注研究[D].北京:北京大学,2002.
5CHENG Yu-ehang, ASAHARA M, MATSUMOTO Y. Machine learning-based dependency analyzer for Chinese [C] // MINGHUI D, HAIZHOU L, MIN Z, eds. Proceedings of the International Conference on Chinese Computing 2005. Singapore: COLIPS Publication, 2005:66-73.
6XUE Nian-wen, XIA Fei, CHIOU Fu-dong, et al. The Penn Chinese Treebank.. phrase structure annotation of a large corpus [J]. Natural Language Engineering, 2005, 11 (2):207-238.
7CHENG Yu-chang, ASAHARA M, MATSUMOTO Y. Chinese deterministic dependency analyzer: examining effects of global features and root node finder [C] // Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Korea: SIGHAN, 2005:17-24.
8LIN De-kang. A dependency-based method for evaluating broad-coverage parsers [J]. Natural Language Engineering, 1998, 4(2): 97-114.
9XIA Fei. Automatic grammar generation from two different perspectives [D]. Philadelphia: University of Pennsylvania, 1999.
10CHOMSKY N. Remarks on nominalization [C] // JACOBS R, ROSENBAUM P, eds. Reading in English Transformational Grammar. Waltham (MA) :Ginn and Co. , 1970:184-221.

引证文献3

1孙加东,Zhao,Tiejun.RM-structure alignment based statistical machine translation model[J].High Technology Letters,2008,14(3):271-275.
2周惠巍,黄德根,钱志强,杨元生.短语结构到依存结构树库转换研究[J].大连理工大学学报,2010,50(4):609-613. 被引量：6
3陈功,罗森林,陈开江,冯扬,潘丽敏.结合结构下文及词汇信息的汉语句法分析方法[J].中文信息学报,2012,26(1):9-15. 被引量：6

二级引证文献12

1耿立飞,李红莲,吕学强,吴云芳.融合词义信息的中文短语句法分析[J].计算机应用,2014,34(4):1109-1113. 被引量：1
2刘胜久,李天瑞,贾真,珠杰.基于Hadoop平台的并行中文句法分析研究[J].计算机科学,2014,41(3):88-90.
3韩磊,罗森林,潘丽敏,魏超.融合词法和句法特征的汉语谓词高精度识别方法[J].浙江大学学报（工学版）,2014,48(12):2107-2114. 被引量：5
4韩磊,罗森林,陈倩柔,潘丽敏.Fast Chinese syntactic parsing method based on conditional random fields[J].Journal of Beijing Institute of Technology,2015,24(4):519-525.
5魏勇,胡丹露,李响,张心悦.顾及句法特征的中文地名识别方法[J].测绘科学技术学报,2016,33(1):99-104. 被引量：5
6王松松,高伟勋.基于高校官网的校情简介数据分析方法[J].计算机与现代化,2018(8):66-72.
7张引兵,宋继华,彭炜明,赵亚伟,宋天宝.短语结构树库向句式结构树库的自动转换研究[J].中文信息学报,2018,32(5):31-41. 被引量：3
8刘刚,傅玮萍,马莺歌.基于语义的政策血缘网络演化机理研究[J].中文信息学报,2018,32(5):114-127. 被引量：4
9赵国荣,王文剑.融合多结构信息的中文句法分析方法[J].计算机科学与探索,2017,11(7):1114-1121. 被引量：2
10周毛克,龙从军,赵小兵,李林霞.基于树库转换的藏语依存句法树库构建方法[J].中文信息学报,2022,36(7):77-85. 被引量：2

1曹海龙,赵铁军,李生.基于词汇化模型的汉语句法分析[J].电子与信息学报,2007,29(9):2082-2085. 被引量：2
2张文涛（评）.搭建图形编辑距离与核机之间的桥梁[J].国外科技新书评介,2009(4):17-18.
3栾浩,黄昌宁.句法分析系统的知识表示和控制机制[J].中文信息,1994,11(4):31-34. 被引量：1
4耿立飞,李红莲,吕学强,吴云芳.融合词义信息的中文短语句法分析[J].计算机应用,2014,34(4):1109-1113. 被引量：1
5黄金钟,朱淼良.基于程序的异常检测研究综述[J].计算机科学,2011,38(6):7-13. 被引量：3
6计峰,邱锡鹏.基于序列标注的中文依存句法分析方法[J].计算机应用与软件,2009,26(10):133-135. 被引量：6
7张开旭,周昌乐.基于自动编码器的中文词汇特征无监督学习[J].中文信息学报,2013,27(5):1-7. 被引量：20
8电信技术发布CTB-3007／3010折弯机控制系统[J].伺服控制,2014(6):24-24.
9危辉,刘斌.一种基于3-像素基元组合的直线描述与检测方法[J].模式识别与人工智能,2007,20(4):439-449.
10周惠巍,黄德根,钱志强,杨元生.短语结构到依存结构树库转换研究[J].大连理工大学学报,2010,50(4):609-613. 被引量：6

高技术通讯

2007年第1期

浏览历史

内容加载中请稍等...

基于中心驱动模型的宾州中文树库（CTB）句法分析被引量：3

参考文献14

同被引文献27

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于中心驱动模型的宾州中文树库（CTB）句法分析 被引量：3

参考文献14

同被引文献27

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于中心驱动模型的宾州中文树库（CTB）句法分析被引量：3