期刊文献+

基于CRF的中文组块分析 被引量:7

Chinese Text Chunking Based CRF
下载PDF
导出
摘要 提出一种基于条件随机域模型的方法用于中文文本组块分析.该方法将中文组块分析转化为对每个词语赋予一个组块标注符号,再根据条件随机域对标注好的训练语料建立模型,从而预测测试语料中每个词语的组块标注符号.使用北京大学中文树库的测试结果为F1=85.5%,高于隐马尔可夫模型和最大熵马尔可夫模型.实验结果表明,条件随机域在中文组块识别方面有效,并避免了严格的独立性假设和数据归纳偏置问题. A new method to solve Chinese text chunking was introduced as conditional random fields (CRF) model, by which Chinese text chunking transformed into labeling the words with their chunk tags and establishing a model for tagged corpus according to Conditional random fields so as to predict the chunk tag of each word. An F1 score of 85.5% is achieved by using the evaluation dataset of Chinese treebank of Beijing university, and obviously better than those of hidden Markov model and maximum entropy Markov model. Experimental results show that conditional random fields model is an effective way on Chinese text chunking and the strict Independence hypothesis and the label bias problem are avoided.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2007年第3期416-420,共5页 Journal of Jilin University:Science Edition
基金 国家自然科学基金青年基金(批准号:60603031) 教育部博士学科点专项科研基金(批准号:20060183044) 吉林省科技发展计划项目基金(批准号:20050527).
关键词 组块分析 条件随机域 特征函数 chunking conditional random fields feature function
  • 相关文献

参考文献6

二级参考文献48

  • 1王昀,苑春法.基于转换的时间-事件关系映射[J].中文信息学报,2004,18(4):23-30. 被引量:19
  • 2王荣波,池哲儒.基于神经元网络的汉语组块自动划分[J].计算机工程,2004,30(20):133-135. 被引量:2
  • 3孙宏林,俞士汶.浅层句法分析方法概述[J].当代语言学,2000,2(2):74-83. 被引量:38
  • 4[1]Erik F, Tjong Kim Sang,Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL2000 and LLL-2000, Lisbon, Portugal, 2000. 127~132
  • 5[2]Steven A. Parsing by Chunks. In: Berwick, Abney, Tenny eds. Principle-Based Parsing: Kluwer Academic Publishers,1991. 257~278
  • 6[5]Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996
  • 7[6]Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing. Institute for Research in Cognitive Science, University of Pennsylvania : Technical Report 9708, 1997
  • 8[7]Berger A, Pietra S D, Pietra V D. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):39~71
  • 9[8]Skut, Wojciech, Thorsten Brants. A maximum entropy partial parser for unrestricted text. In:Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Canada, 1998. 143~151
  • 10[10]Abney S. Part-of-speech tagging and partial parsing. In:Church K, Young S, Bloothooft G eds. Corpus-Based Methods in Language and Speech, An ELSNET volume, Dordrecht:Kluwer Academic Publishers, 1996. 119~136

共引文献66

同被引文献95

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部