期刊文献+

分层次的汉语功能块描述库构建分析 被引量:8

Analysis of the Hierarchical Chinese Functional Chunk Bank
下载PDF
导出
摘要 现有功能块分析器对于不同长度和不同结构功能块的分析性能研究表明,长的结构复杂的功能块正是功能块自动分析的难点所在。由此,我们设计了新的分层次的功能块体系,并从清华句法树库TCT中自动生成了新的功能块语料库。通过对新的功能块语料库长度分布、内部结构分布分析,以及与单层次功能块语料库的相互关系的研究,我们证实了新的分层次功能块描述体系具有结构简单、长度短且分布均匀的优良特点。这些性质对功能块分析器的性能提高将会有很大的帮助。 Through an experimental analysis of the relation between the length and the structure of functional chunks and the performance of the parser, this paper reveals that it is the long, complex-structured functional chunks that are the major difficulty in parsing. Therefore, this paper proposes a new hierarchical functional chunk scheme and automatically generates the new functional chunk bank from Tsinghua Chinese Tree Bank (TCT). Further extensive researches about the length and structure distribution of the new chunk bank indicates that the new functional chunks bear short length and simple structure, which will help to improve the performance of functional chunk parser.
出处 《中文信息学报》 CSCD 北大核心 2008年第3期24-31,43,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60573185) 国家高技术研究发展计划资助项目(2007AA01Z173)
关键词 计算机应用 中文信息处理 部分分析 功能块 分层次描述 computer application Chinese information processing partial parsing functional chunk hierarchical description
  • 相关文献

参考文献9

  • 1Steven Abney. Parsing By Chunks [A]. Robert C. Berwick, Steven P. Abney, and Carol Tenny. Principle-Based Parsing[C]. Dordrecht: Kluwer Academic Publishers, 1991. 257-278.
  • 2Eva Ejerhed. Finite state segmentation of discourse into clause [J]. Natural Language Engineering, 1996 (2) : 355-364.
  • 3Sandra Kubler and Erhard W. Hinrichs. From chunks to function-argument structure: A similarity-based approach[A]. In:Proceedings of ACL/EACL 2001[C]. Toulouse, France: 2001. 338-345.
  • 4Grzegorz Chrupala and Josef van Genabith. Using Machine-Learning to Assign Function Labels to Parser Output for Spanish[A]. In:Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions[C]. Sydney: 2006. 136-143.
  • 5周强,任海波,詹卫东.构建大规模汉语语块库[A].黄昌宁,张普主编自然语言理解与机器翻译[C].北京:清华大学出版社,2001.102-107.
  • 6周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:90
  • 7Yingze Zhao and Qiang Zhou. A SVM-based Model for Chinese Functional Chunk Parsing[A]. In: Proceedings of Fifth SIGHAN Workshop on Chinese Language COLING/ACL 2006 Workshop[C]. Sydney, Australia: 2006. 94-101.
  • 8Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields[A] In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computer Linguistics on Human Language Technology [C]. Edmonton, Canada: 2003. 134-141.
  • 9A. McCallum. Efficiently In;ducing Features of Conditional Random Fields[A]. In; Proceedings of the 19th Conference on Uncertainty in Artificial In; telligence[C]. 2003, 403-410.

二级参考文献21

  • 1戴浩一.概念结构与非自主性语法:汉语语法概念系统初探[J].当代语言学,2002,4(1):1-12. 被引量:109
  • 2Brants, S., & Hansen, S. (2002). Developments in the TIGER annotation scheme and their realization in the corpus[A]. In: Proceedings of the Third Conference on Language Resources and Evaluation (LREC-02)[C]. Las Palmas de Gran Canaria, Spain. 1643-164
  • 3Collins, M. (1999) Head-Driven Statistical Models for Natural Language Parsing[D]. Ph.D. Thesis. Dept. of Computer Science and Information, The University of Pennsylvania.
  • 4Hajic, J. (1999). Building a syntactically annotated corpus: The Prague Dependency Treebank[A]. In: E. Hajicova (Ed.), Issues of valency and meaning. Studies in honour of Jarmila Panevova. Prague, Czech Republic: Charles University Press.
  • 5Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, & al.(2000). Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface[A], Proceedings of the Second Chinese Language Processing Workshop[C], HongKong. 29-37.
  • 6Kingsbury, P.; Martha Palmer, and Marcus, M. (2002). Adding Semantic Annotation to the Penn TreeBank[A]. In: Proceedings of the Human Language Technology Conference[C], San Diego, California.
  • 7Leech, G.; and Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks' [A]. In: Stig Johansson and Anna-Brita Stenstrom (eds.) English Computer Corpora: Selected papers and Research Guide. 1991. 15-3
  • 8Marcus, M., Kim, G., Marcinkiewicz, M.,& al. (1994). The Penn Treebank: Annotating predicate argument structure [A]. In: Proc. of the ARPA Human Language Technology Workshop[C]. San Francisco, CA.
  • 9Mitchell P.Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini (1993). Building a Large Annotated Corpus of English: The Penn Treebank[J], Computational Linguistics, 19(2):313-330.
  • 10Stephan Oepen, Dan Flickinger, Kristina Toutanova, et. al. (2002). LinGO Redwoods-A Rich and Dynamic Treebank for HPSG [A]. In: Proc. of First Workshop on Treebanks and Linguistic Theories (TLT2002) [C]. 139-149.

共引文献90

同被引文献103

引证文献8

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部