期刊文献+

HNC语义标注模型的构建 被引量:3

Novel HNC Conceptual Tagging Model for Corpus
下载PDF
导出
摘要 介绍一种基于HNC理论的、人机结合的汉语语料语义标注模型。首先分析了HNC语义标注的内容,在此基础上定义了标注的流程。因标注十分复杂,在流程的主要环节使用机器标注来帮助人工标注。具体地说,在语义块切分问题上采用最大熵模型,其正确率和召回率分别达到了83.78%和91.17%;在句类判断问题上采用基于实例的模型,其正确率达到了51.64%。运用此标注模型建设了HNC语义标注语料库,目前语料规模已达到40万字。 This paper introduced a novel conceptual tagging model for corpus which is based on the Hierarchical Network of Concepts (HNC) theory,and which benefits from manual work and automatic machine. Firstly, the contents of tagging were given, and the process of tagging was defined. For the complexity of the process, some machine tagging ways were used to help manual work. A maximum entropy model was adopted to deal with the problem of semantic chunks segmentation, and the test precision and recall are 83.78 % and 91.17 %. An example based model was adopted to deal with the problem of sentence category parsing, and the test precision is 51.64 %. Relying on the model,a HNC corpus was constructed,which currently reaches 400,000 characters.
作者 谢法奎 张全
出处 《计算机科学》 CSCD 北大核心 2009年第5期238-240,268,共4页 Computer Science
基金 国家973项目"自然语言理解的交互引擎研究"(2004CB318104) 中国科学院声学研究所"所长择优基金"(GS13SJJ04)资助
关键词 概念层次网络 语料库 最大熵模型 HNC, Corpus, Maximum entropy model
  • 相关文献

参考文献6

二级参考文献41

  • 1张万有.义素分析略说[J].语言教学与研究,2001(1):61-65. 被引量:17
  • 2[1]Erik F, Tjong Kim Sang,Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL2000 and LLL-2000, Lisbon, Portugal, 2000. 127~132
  • 3[2]Steven A. Parsing by Chunks. In: Berwick, Abney, Tenny eds. Principle-Based Parsing: Kluwer Academic Publishers,1991. 257~278
  • 4[5]Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996
  • 5[6]Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing. Institute for Research in Cognitive Science, University of Pennsylvania : Technical Report 9708, 1997
  • 6[7]Berger A, Pietra S D, Pietra V D. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):39~71
  • 7[8]Skut, Wojciech, Thorsten Brants. A maximum entropy partial parser for unrestricted text. In:Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Canada, 1998. 143~151
  • 8[10]Abney S. Part-of-speech tagging and partial parsing. In:Church K, Young S, Bloothooft G eds. Corpus-Based Methods in Language and Speech, An ELSNET volume, Dordrecht:Kluwer Academic Publishers, 1996. 119~136
  • 9[11]Church K W. A stochastic parts program and noun phrase parser for unrestricted text. In:Proceedings of the 2nd Conference on Applied Natural Language Processing, Texas, USA, 1988.136~143
  • 10[12]Ramshaw L A, Marcus M P. Text chunking using transformation-based learning. In: Proceedings of ACL Third Workshop on Very Large Corpora, Cambridge, USA, 1995. 82~94

共引文献173

同被引文献43

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部