期刊文献+

适应多领域多来源文本的汉语依存句法数据标注规范 被引量:4

Annotation Guideline of Chinese Dependency Treebank from Multi-domain and Multi-source Texts
下载PDF
导出
摘要 近十年来,依存句法分析由于具有表示形式简单、灵活、分析效率高等特点,得到了学术界广泛关注。为了支持汉语依存句法分析研究,国内同行分别标注了几个汉语依存句法树库。然而,目前还没有一个公开、完整、系统的汉语依存句法数据标注规范,并且已有的树库标注工作对网络文本中的特殊语言现象考虑较少。为此,该文充分参考了已有的数据标注工作,同时结合实际标注中遇到的问题,制定了一个新的适应多领域多来源文本的汉语依存句法数据标注规范。我们制定规范的目标是准确刻画各种语言现象的句法结构,同时保证标注一致性。利用此规范,我们已经标注了约3万句汉语依存句法树库。 Dependency parsing has attracted much attention in the research community.There is no public,integrated and systematic annotation guideline for Chinese dependency treebank.Considering the special linguistic phenomena in web texts,this paper proposes a new annotation guideline for Chinese dependency treebank,which is adapted to multi-domain and multi-source texts.This annotation guideline aims to accurately depict the syntactic structures of various linguistic phenomena,and to ensure annotation consistency as well.Based on the proposed guideline,we have annotated about 30 000 Chinese sentences with their dependency structures.
作者 郭丽娟 李正华 彭雪 张民 GUO Lijuan;LI Zhenghua;PENG Xue;ZHANG Min(School of Computer Science and Technology,Soochow University Suzhou,Jiangsu 215006,China)
出处 《中文信息学报》 CSCD 北大核心 2018年第10期28-35,52,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61502325 61432013 61525205)
关键词 依存句法 标记规范 dependency annotation guideline
  • 相关文献

参考文献2

二级参考文献14

  • 1周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:90
  • 2党政法,周强.短语树到依存树的自动转换研究[J].中文信息学报,2005,19(3):21-27. 被引量:12
  • 3靳光瑾,肖航,富丽,章云帆.现代汉语语料库建设及深加工[J].语言文字应用,2005(2):111-120. 被引量:46
  • 4朱德熙.现代汉语语法研究[M].北京:商务印书馆,1979.
  • 5M P Marcus, B Santorin, M A Marcinkiewicz. Build- ing a large annotated corpus of English: the Penn Treebank[J]. Computational Linguistics, 1993, 19 (2) : 313-330.
  • 6M Collins. A Statistical Dependency Parser Of Chinese Under Small Training Data [C]//Proceedings of the 34th Annual Meeting of the ACL, 1996: 184-191.
  • 7M Collins. Three Generative, Lexicalized Models for Statistical Parsing[C]//Proceedings of the 35th annual meeting of the association for computational linguistics, 1997: 16-23.
  • 8H Yamada, Y Matsumoto. Statistical Dependency Analysis with Support Vector Machines [C]//Proceed- ings of the 8th International Workshop on Parsing Technologies (IWPT), 2003: 195-206.
  • 9N Xue, F Xia, F D Chiou, et al. The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus[J]. Natural Language Engineering, 2005, 11 (2) :207-238.
  • 10陈凤仪,蔡碧芳,陈克健,等.中文旬结构树资料库(Sinica Treebank)的构建[J].Computational Linguistics and Chinese Language Processing, 1999, 4 (2) : 87-104.

共引文献49

同被引文献62

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部