期刊文献+

基于最长次长匹配分词的一体化中文词法分析 被引量:3

Chinese integrative lexical analysis based on maximum matching and second-maximum matching segmentation
下载PDF
导出
摘要 针对当前大多数词法分析系统"流水线"式处理方式存在的不足,提出一种一体化同步词法分析机制.在最长次长匹配分词的基础上,在切分有向图中增加词性信息和候选未登录词节点,并拓展隐马尔可夫模型,在切分有向图内同步完成分词、歧义消解、未登录词识别和词性标注等词法分析任务.实现了分词与词性标注的一体化、未登录词识别与分词的一体化以及不确定词性未登录词处理的一体化.一体化机制使词法分析中各步骤实现真正意义上的同步完成,充分利用上下文词法信息提高整体精度并保证了系统的高效性,避免了各步骤间的冲突.开放测试表明,系统综合测试的F值为98.03%. An integrative lexical analysis mechanism is proposed in order to solve the limitation of mostly existing lexical analysis system with″pipelining″mechanism.Based on maximum matching and second-maximum matching(MMSM) model,in the directed graph built by MMSM model,candidate words,parts-of-speech(POS) tags and all the candidate unknown words are added and considered,hidden Markov model(HMM) is extended,so Chinese word segmentation,ambiguity resolution,unknown word recognition and POS tagging are solved synchronously.The integrations of word segmentation and POS tagging,unknown words recognition and known word segmentation,uncertain unknown words recognition are realized.All the tasks of lexical analysis are accomplished synchronously,the conflicts between all the tasks in the Chinese lexical analysis are avoided,and high precision can be gained.The open test indicates that the F-score of the system is 98.03%.
作者 孙晓 黄德根
出处 《大连理工大学学报》 EI CAS CSCD 北大核心 2010年第6期1028-1034,共7页 Journal of Dalian University of Technology
基金 中央高校基本科研业务费专项资金资助项目(DUT10RW202)
关键词 中文词法分析 一体化模型 最长次长匹配 未登录词 切分有向图 Chinese lexical analysis integrative model maximum matching and second-maximum matching unknown word segmentation directed graph
  • 相关文献

参考文献16

  • 1黄德根,朱和合,王昆仑,杨元生,钟万勰.基于最长次长匹配的汉语自动分词[J].大连理工大学学报,1999,39(6):831-835. 被引量:14
  • 2刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 3JIANG F, LIU H, CHEN Y Q, etal. An enhanced model for Chinese word segmentation and part-of- speech tagging [C] //ACL SIGHAN Workshop 2004. Barcelona:Association for Computational Linguistics, 2004:28-32.
  • 4高山,张艳,徐波,等.基于三元统计模型的汉语分词及标注一体化研究[c]//自然语言理解与机器翻译一全国第六届计算语言学联合学术会议论文集,2001.
  • 5SUN M S, XU D L, BENJAMIN K T. Integrated Chinese word segmentation and part-of-speech tagging based on the divide-and-conquer strategy [C] // Proceedings of 2003 International Conference on Natural Language Processing and Knowledge Engineering. Beijing: IEEE Computer Society, 2003: 610-615.
  • 6ZHANG Y, CLARK S. Joint word segmentation and POS tagging using a single perceptron [C] // Proceedings of ACL2008. Columbus: Association for Computational Linguistics, 2008 : 888-896.
  • 7GAO J F, LI M, HUANG C N. Improved source- channel models for Chinese word segmentation [C] // Proceedings of ACL2003. Sapporo: Association for Computational Linguistics, 2003:272-279.
  • 8GAO J F, WU A D, LI M, et al. Adaptive Chinese word segmentation [C] // Proceedings of ACL2004. Morristown : Association for Computational Linguistics, 2004:462-469.
  • 9黄德根,岳广玲,杨元生.基于统计的中文地名识别[J].中文信息学报,2003,17(2):36-41. 被引量:49
  • 10黄德根 朱和合 杨元生.基于单词与双词可信度的汉语自动分词.计算机研究与发展,2001,:132-135.

二级参考文献45

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 2沈达阳 孙茂松 黄昌宁.中文地名的自动识别[A]..计算语言学进展与应用[C].北京:清华大学出版社,1995..
  • 3H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 4Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 5S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 6J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 7Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 8Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 9Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 10J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998

共引文献347

同被引文献26

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部