期刊文献+

汉语基本块描述体系 被引量:25

Base Chunk Scheme for the Chinese Language
下载PDF
导出
摘要 块分析是自然语言处理研究中的重要技术,其处理基础是设计一套合理有效的块描述体系。本文在吸收和总结前人研究成果和经验的基础上,提出了一套基于拓扑结构的汉语基本块描述体系。它通过引入词汇关联信息确定基本拓扑结构,形成了很好的基本块内聚性判定准则,建立了句法形式与语义内容的有机联系桥梁。这套描述体系大大简化了从现有的句法树库TCT中自动提取基本块标注语料库和相关词汇关联知识库的处理过程,为进一步进行汉语基本块自动分析和词汇关联知识获取互动进化研究打下了很好的基础。 Chunk parsing is an important technique in the natural language processing research community, whose processing basis lies in a suitable and efficient chunk scheme. In this paper, we proposed a new topology-based base chunk scheme for the Chinese language. After introducing the lexical cohesion relationships to determinate three basic topological structures, we formed a better set of principles to analyze the content cohesion of a base chunk and built an efficient bridge to link its syntactic form and semantic meaning. Based on the chunk scheme, we can greatly simplify the processing procedure to automatically extract useful base chunk annotated corpora and corresponding lexical cohesion knowledge from a large scale Chinese syntactically annotated corpus TCT. All these research work will lay good foundations for the further explorations to develop Chinese base chunk parser and lexical cohesion knowledge acquisition tools.
作者 周强
出处 《中文信息学报》 CSCD 北大核心 2007年第3期21-27,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60573185 60520130299)
关键词 计算机应用 中文信息处理 基本块 部分分析 语料库标注 词汇知识获取 computer application Chinese information processing base chunk partial parsing corpus annotation lexical knowledge acquisition
  • 相关文献

参考文献13

  • 1Steven Abney. Parsing by Chunks [A]. In: Robert Berwick, Steven Abney and Carol Tenny (eds.) Principle Based Parsing [C]. Kluwer Academic Publishers, 1991.
  • 2Erik F. Tjong Kim Sang and Sabine Buchholz. Introduction to CoNLL-2000 Shared Task: Chunking [A].In: Proceedings of CoNLL 2000 and LLL-2000 [C].Lisbon, Portugal, 127-132.
  • 3Sang T K and D jean H. Introduction to the CoNLL2001 Shared Task: Clause Identification [A]. In:Proc. of CoNLL-2001 [C]. Toulouse, France, 53-57.
  • 4Carreras X. and Marquez, L. Introduction to the con-Ⅱ-2005 shared tasks: Semantic role labeling [A]. In:Proc. of CoNLL-2005 [C].
  • 5Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. Multiword Expressions: A Pain in the Neck for NLP [A]. In: Proc.Third International Conference of Computational Linguistics and Intelligent Text Processing (CICLing 2002) [C]. Mexico City, Mexico, February 2002. 17-23.
  • 6徐通锵.语言论[M].吉林长春:东北师范大学出版社,1997..
  • 7董振东.语义关系的表达和知识系统的建造[J].语言文字应用,1998(3):79-85. 被引量:59
  • 8董振东,董强.关于知网中文信息结构库[A],http://www.keenage.com/,2000.
  • 9汉语基本短语标注规范[R].清华大学计算机系智能技术与系统国家重点实验室,技术资料,2002年2月.
  • 10张昱琪,周强.汉语基本短语的自动识别[J].中文信息学报,2002,16(6):1-8. 被引量:41

二级参考文献24

  • 1戴浩一.概念结构与非自主性语法:汉语语法概念系统初探[J].当代语言学,2002,4(1):1-12. 被引量:109
  • 2周强.汉语语料库的短语自动划分和标注研究.北京大学博士研究生学位论文[M].-,1996..
  • 3赵军.汉语基本名词短语识别及结构分析研究.清华大学工学博士学位论文[M].-,1998..
  • 4孙宏林.现代汉语非受限文本的实语块分析.北京大学博士研究生学位论文[M].-,2001..
  • 5Brants, S., & Hansen, S. (2002). Developments in the TIGER annotation scheme and their realization in the corpus[A]. In: Proceedings of the Third Conference on Language Resources and Evaluation (LREC-02)[C]. Las Palmas de Gran Canaria, Spain. 1643-164
  • 6Collins, M. (1999) Head-Driven Statistical Models for Natural Language Parsing[D]. Ph.D. Thesis. Dept. of Computer Science and Information, The University of Pennsylvania.
  • 7Hajic, J. (1999). Building a syntactically annotated corpus: The Prague Dependency Treebank[A]. In: E. Hajicova (Ed.), Issues of valency and meaning. Studies in honour of Jarmila Panevova. Prague, Czech Republic: Charles University Press.
  • 8Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, & al.(2000). Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface[A], Proceedings of the Second Chinese Language Processing Workshop[C], HongKong. 29-37.
  • 9Kingsbury, P.; Martha Palmer, and Marcus, M. (2002). Adding Semantic Annotation to the Penn TreeBank[A]. In: Proceedings of the Human Language Technology Conference[C], San Diego, California.
  • 10Leech, G.; and Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks' [A]. In: Stig Johansson and Anna-Brita Stenstrom (eds.) English Computer Corpora: Selected papers and Research Guide. 1991. 15-3

共引文献212

同被引文献267

引证文献25

二级引证文献96

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部