期刊文献+

基于序列标注的中文分词、词性标注模型比较分析 被引量:12

A Comparison Study of Sequence Labeling Methods for Chinese Word Segmentation,POS Tagging Models
下载PDF
导出
摘要 该文对三种不同的分词词性标注模型进行了比较。这三种模型分别为一个序列标注串行模型,一个基于字分类的联合模型和一个将这两种模型使用Stacked Learning框架进行集成的融合模型。通过在《人民日报》、CoNLL09、CTB5.0和CTB7.0四个数据集上进行比较分析,最终实验结果表明分类联合模型能取得比较好的速度,融合模型能取得比较好的准确率,而普通串行模型处于速度和准确率的平衡位置。最后该文将准确率最好的融合模型和相关前沿工作在CTB5.0和CTB7.0上进行了对比,该融合模型均取得了最好的结果。 In this paper, we compare three different Chinese word segmentation and POS tagging models. Accuracy and speed are considered during the comparison. First of these three models are pipelinesequential model. The sec- ond is a joint model for word segmentation and POS tagging, andthe last one is a combination of two modelsmen- tionedabove with a stacked learning framework. We conduct experiments on four data sets, including People Daily, CoNLL09, CTB5.0 and CTB7.0. Experimental results show that the joint model achieves the fastest speed while the stacked learning model achievesthe highest accuracy. Finally, we compare our stacked learning model with state of-the-art systems on data sets CTB5.0 and CTB7.0 and our model achieve the best performance in this comparison.
出处 《中文信息学报》 CSCD 北大核心 2013年第4期30-36,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金重点资助项目(61133012) 国家863重大资助项目(2011AA01A207) 国家863先进技术研究资助项目(2012AA011102)
关键词 中文分词 词性标注 Stacked LEARNING Chinese Word Segmentation~ POS tagging~ Stacked Learning
  • 相关文献

参考文献11

  • 1张梅山,邓知龙,车万翔,刘挺.统计与词典相结合的领域自适应中文分词[J].中文信息学报,2012,26(2):8-12. 被引量:44
  • 2Nianwen Xue. Chinese word segmentation as character tagging[J]. InternationalJournal of Computational Linguistics and Chinese Language Processing. 2003. 8 0): 29-48.
  • 3Tseng H. Chang P. Andrew G. et al. A conditional random field word segmenter for sighan bakeoff 2005[CJ/ /Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. 2005: 17l.
  • 4Yue Zhang. Stephen Clark. Chinese segmentation with a word-based perceptron algorithm[CJ/ /Proceedings of the 45 th ACL. 2007: 840-847.
  • 5Collins M. Discriminative training methods for hidden markov models: Theory and experiments with percep?tron algorithms[CJ/ /Proceedings of the ACL-02 con?ference on Empirical methods innatural language pro?cessing-Volume 10. 2002: 1-8.
  • 6Ng. HweeTou , Iin Kiat Low. Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?[CJ/ /Proceedings of EMNLP 2004. 2004: 277-284.
  • 7Yue Zhang. Stephen Clark.Joint Word Segmentation and POS Tagging Using a Single Perceptron[CJ/ /Pro?ceedings of ACL-08: HL T. 2008: 888-896.
  • 8Crammer K. Singer Y. Ultraconservative online algo?rithms for multiclass problems[J]. TheJournal of Ma?chine Learning Research. 2003: 951-99l.
  • 9Cohen W W. Stacked sequential learning[CJ/ /Pro?ceedings of InternationalJoint Conference on Artificial Intelligence. 2005: 671-676.
  • 10Wang Y KazamaJ. Tsuroka Y. et al. Improving Chi?nese Word Segmentation and POS Tagging with Semi?supervised Methods Using Large Auto-Analyzed Data[CJ/ /Proceedings of 5th InternationalJoint Confer?ence on Natural Language Processing. Asian Federa?tion of Natural Language Processing. 2011: 309-317.

二级参考文献9

  • 1骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-36. 被引量:28
  • 2Nianwen Xue.Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
  • 3Huihsin Tseng,Pichuan Chang,Galen Andrew,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]//Proceedings of the fourth SIGHAN workshop.2005:168-171.
  • 4Yue Zhang,Stephen Clark.Chinese segmentation with a word-based perceptron algorithm[C]//Proceedings of the 45th ACL.2007:840-847.
  • 5Xu Sun,Yaozhong Zhang,Takuya Matsuzaki,et al.A discriminative latent variable chinese segmenter with hybrid word/character information[C]//Proceedings of NAACL.2009:56-64.
  • 6Hai Zhao,Chang-Ning Huang,Mu Li.An Improved Chinese Word Segmentation System with Conditional Random Field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006:162-165.
  • 7Pi-Chuan Chang,Michel Galley,Christopher D.Manning.Optimizing Chinese Word Segmentation for Machine Translation Performance[C]//ACL Workshop on Statistical Machine Translation.2008:224-232.
  • 8John D. Lafferty,Andrew McCallum,Fernando C.N.Pereira. Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of ICML.2001:282-289.
  • 9吴春颖,王士同.基于二元语法的N-最大概率中文粗分模型[J].计算机应用,2007,27(12):2902-2905. 被引量:12

共引文献43

同被引文献59

引证文献12

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部