期刊文献+

级联中文组块识别 被引量:2

Cascade Identification of Chinese Chunks
下载PDF
导出
摘要 基于统计方法的中文组块研究大多借鉴CoNLL2000英文组块的思想,建立了组块表示的BIO模型,并将组块识别任务作为一种为词序列标注的多分类问题.为降低分类复杂度,采取了一种分解识别法,即先识别组块的边界,再进行组块类别判定.基于条件随机场(CRF)构建了级联组块识别器,实验数据集采用宾州大学中文树库(CTB5.1).在特征选择上,借鉴了中文分词特征选择的方法.5倍交叉验证的实验结果为:组块边界识别的F1值为95.05%;类型识别的准确率为99.43%;整体F1值为93.58%.该方法提高了系统性能,缩短了学习器的训练时间. Most statistical-based Chinese chunking researches was inspired by English chunking of CoNLL2000. After representing chunks within the scheme of tags for words in a chunk, chunk identification task was cast as word sequence tagging and tackled as multi-classification problems. For sake of decreasing classification complexity, a decomposed chunking approach was proposed: first, chunk boundary identification, and then chunk type identification. The vital problem of Chinese chunking is actually boundary identification. Cascade chunk identifiers were built based on conditional random fields (CRF). The experimental dataset was extracted from Chinese tree bank 5.1 (CTBS. 1). As to the features selection, some methods often used in Chinese word segmentation were borrowed to chunking task. On 5 cross validation of dataset, F1-measure of chunk boundary identification is 95.05 %, and the precision of chunk type recognition is 99.43 % as well. And the total chunking F1- meausre reaches 93.58 %. Comparing with other relative researches, the performance is improved and the training time of learners is sharply shortened.
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2008年第1期14-17,共4页 Journal of Beijing University of Posts and Telecommunications
基金 语言司民文语科库工具建设项目(MZ115-022)
关键词 中文组块 边界识别 类别识别 条件随机场 Chinese chunking boundary identification type identification conditional random fields
  • 相关文献

参考文献8

  • 1Abney S P. Parsing by chunks[C] //Steven P, Abney, Carol Tenny. Principle-Based Parsing. MA: [s. n.], 1991 : 257-278.
  • 2Erik F, Tjong Kim Sang, Sabine Buchholz. Introduction to the CoNLL-2000 shared task : ehunking [ C] // CoNLL- 2000 and LLL-2000. Lisbon: [s.n.], 2000: 127-132.
  • 3Sha Fei, Fernando C N, Pereira. Shallow parsing with conditional random fields[C]//Edmonton Alberta. Human Language TechnologyNAACL. CA: [s. n. ], 2003: 213-220.
  • 4Sun Guanglu, Huang Changning, Wang Xiaolong, et al. Chinese chunking based on maximum entropy Markov models [J]. Computational Linguistics and Chinese Language Processing, 2006, 11(2): 115-136.
  • 5Chen Wenliang, Zhang Yujie, Isahara Hitoshi. An empirical study of Chinese chunking [C] // Coling-ACL2006 (Poster Session). Sydney: [s.n.], 2006: 97-104.
  • 6Tan Yongmei, Yao Tianshun, Chen Qing, et al. Applying conditional random fields to Chinese shallow parsing [C] // Proceedings of CICLing 2005. Mexico City: Springer, 2005: 67-176.
  • 7李珩,朱靖波,姚天顺.基于SVM的中文组块分析[J].中文信息学报,2004,18(2):1-7. 被引量:50
  • 8Lafferty John, McCallum Andrew, Fernando Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data [ C ]//International Conference on Machine Learning (ICML01). San Francisco: Morgan Kaufmann, 2001: 282-289.

二级参考文献4

共引文献49

同被引文献43

引证文献2

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部