期刊文献+

基于层叠CRF模型的词结构分析 被引量:7

Word Structure Analysis Based on Cascaded CRFs
下载PDF
导出
摘要 传统的中文分词就是识别出每个词的边界,它忽略了汉语中词与短语分界不清这一特点。在理论上,语言学家对词边界的确定往往各持己见,各语料库的分词标准不能统一,在实践中也不能完全满足具体应用的需求。该文给出了基于层叠CRF模型的词结构自动分析方法,能够以较高的精确度获得词的边界信息和内部结构信息。相比于传统的分词,词的结构分析更加符合汉语词法与句法边界模糊的事实,解决了语料库标准的不一致性以及应用的不同需求。 Traditional research in Chinese word segmentation focuses on identifying word boundaries, without con- sidering the ambiguity of boundaries between Chinese words and phrases. In theory, linguists stick to their own view of word boundaries such that no uniform standard exists in Chinese word segmentation, and in practice, the corpus of various guidelines cannot bring satisfactory reusltsto wide applications. In this paper, we present a model based on cascaded CRF models to automatically parse internal structures of words, deciding both word boundaries and internal structures simultaneously with high precision. Compared with the traditional word segmentation meth- ods, analyzing the structure of words is more consistent with the fact of fuzzy boundaries between Chinese lexical and syntactic units, solving the problem of inconsistent corpus standards and meeting different application require- ments.
作者 方艳 周国栋
出处 《中文信息学报》 CSCD 北大核心 2015年第4期1-7,24,共8页 Journal of Chinese Information Processing
基金 自然科学基金青年项目(61202162) 教育部博士点基金新教师类课题(20123201120011)
关键词 中文分词 内部结构 分词标准 层叠CRF Chinese word segmentation internal structure annotation standard cascaded CRFs
  • 相关文献

参考文献13

  • 1Hai Zhao. Character-level dependencies in Chinese: Usefulness and learning[C]//Proceedings of the 12th Conference of the European Chapter of the ACL(EA- CL 2009). 2009:879-887.
  • 2Zhengdong Dong, Qiang Dong, Changling Hao. Word segmentation needs change-from a linguist's view[C]// Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010:1-7.
  • 3Andi Wu. Customizable segmentation of morphologi- cally derived words in Chinese [C]//Computational Linguistics and Chinese language processing. 2003,8 (1) :1-27.
  • 4Jianfeng Gao, Andi Wu, Mu Li Chang-Ning Huang, et al. Adaptive Chinese word segmentation[C]//Pro- cessings of the 42nd Annual Meeting on Association for Computational Linguistics. 2004.. 62-469.
  • 5Wenbin Jiang, Liang Huang, Qun Liu. Automatic ad aptation of annotation standards: Chinese word seg mentation and POS tagging-a case study[C]//Proceed ings of the Joint Conference of the 47th Annual Meet ing of the ACL and the 4th International Joint Confer ence on Natural Language Processing of the AFNLP 2009: 522-530.
  • 6孟凡东,徐金安,姜文斌,刘群.异种语料融合方法:基于统计的中文词法分析应用[J].中文信息学报,2012,26(2):3-7. 被引量:5
  • 7Zhongguo Li. Parsing the Internal Structure of Words.. A new paradigm for Chinese word segmentation[C]// Proceedings of the 49th Annual Meeting of the Associ- ation of Computational Linguistics. 2011:1405-1414.
  • 8Hai Zhao. Changning Huang, Mu Li. An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 2006: 162-165.
  • 9Yoshimasa Tsuruoka, Jun'ichi Tsujii, Sophia Ananiadou. Fast full parsing by linear chain condition- al random fields[C]//Proceedings of the 12th Confer- ence of the European Chapter of the ACI.. 2009:790- 798.
  • 10S Abney, S Flieknger, C Gdaniec, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars [C]//Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics. 1991 : 306-311.

二级参考文献16

  • 1骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-36. 被引量:28
  • 2Wenbin Jiang, Liang Huang, Qun Liu. Automatic Adaptation of Annotation Standards:Chinese Word Segmentation and POS Tagging-A Case Study.Association for Computational Linguistics[C]//Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Suntec,Singapore:ACL Publication Chairs,2009:522-530.
  • 3Hwee Tou Ng,Jin Kiat Low.Chinese part-of-speech tagging:One-at-a-time or all-at-once? word-based or character-based?[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.Barcelona,Spain:ENMLP Publication Chairs,2004.
  • 4Wenbin Jiang,Liang Huang,Yajuan Lv,et al.A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Oho, USA: ACL Publication Chairs,2008:897-904.
  • 5Wenbin Jiang,Haitao Mi,Qun Liu. Word Lattice Reranking for Chinese Word Segmentation and Part-ofSpeech Tagging[C]//Proceedings of the 22nd International Conference on Computational Linguistics.Manchester,England:COLING Publication Chairs,2008:385-392.
  • 6Kun Wang,Chengqing Zong,Keh-Yih Su.A Character-Based Joint Model for Chinese Word Segmentation[C]//Proceedings of the 24th International Conference on Computational Linguistics.Beijing,China:COLING Publication Chairs,2010:1173-1181.
  • 7Zhongguo Li,Maosong Sun.Punctuation as Implicit Annotations for Chinese Word Segmentation[J].Computational Linguistics. Proceedings of Computational Linguistics.2009,35(4):505-512.
  • 8Yue Zhang,Stephen Clark.Chinese segmentation with a word-based perceptron algorithm[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague,Czech Republic:ACL Publication Chairs,2007:840-847.
  • 9Nianwen Xue.Chinese word segmentation as character tagging[J]. International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):29-48.
  • 10Huihsin Tseng,Pichuan Chang,Galen Andrew,et al.A conditional random field word segmenter for sighan bakeoff 2005[C]//Proceedings of the fourth SIGHAN workshop.2005:168-171.

共引文献46

同被引文献61

引证文献7

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部