期刊文献+

归一化的邻接变化数方法在中文分词中的应用 被引量:5

Apply Normalized Accessor Variety in Chinese Word Segmentation
下载PDF
导出
摘要 该文提出了一种无监督和有监督相结合的中文分词方法:将邻接变化数(Accessor Variety,AV)引入基于条件随机场的中文分词系统中。针对邻接变化数在处理较少的训练数据时存在的缺陷,提出了一种归一化的改进方法,以减轻计算AV值时产生的波动。基于Bakeoff-4的中文分词实验表明,归一化的邻接变化数方法无论对于封闭测试,还是开放测试,都带来了性能的提升。 This paper proposes a method combining supervised learning with unsupervised method to conduct Chinese word segmentation (CWS), which incorporates the Accessor Variety (AV) into the Conditional Random Fields (CRFs). To solve the flaw in Accessor Variety (AV) when dealing with limited training data, normalization is in- troduced to alleviate the fluctuation in the AV value in the phrase of unsupervised segmentation. Experiments on the Bakeoff-4 data indicate that normalized Accessor Variety is effective both for close and open tracks.
出处 《中文信息学报》 CSCD 北大核心 2010年第1期15-19,共5页 Journal of Chinese Information Processing
基金 高等学校学科创新引智计划资助项目(B08004) 国家支撑计划资助项目(2007BAHo5B02-04)
关键词 计算机应用 中文信息处理 无监督分词 条件随机场 归一化的邻接变化数方法 computer application Chinese information processing unsupervised segmentation CRFs normalized accessor variety
  • 相关文献

参考文献6

  • 1J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]//Proceedings of the 18th ICML, San Francisco, CA. 2001: 282-289.
  • 2Zellig Sabbetai Harris. Morpheme within words [C]// Papers in Structural and boundaries Transformational Linguistics, 1970:68-77.
  • 3Hai Zhao and Chunyu Kit. Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition [C]//The Sixth SIGHAN Workshop on Chinese Language Processing (SIGHAN-6), Hyderabad, India, 2008: 106-111.
  • 4Haodi Feng, Kang Chen, Chunyu Kit, and Xiaotie Deng. Unsupervised segmentation of Chinese corpus using accessor variety [C]//K. Y. Su, J. Tsujii, J. H. Lee, and O. Y. Kwong, editors, Natural Language Processing IJCNLP 2004, volume 3248 of Lecture Notes in Computer Science, Springer Berlin/Heidelberg. Sanya, Hainan Island, China. 2005 : 694-703.
  • 5Xinnian Mao, Yuan Dong, Saike He, Sencheng Bao and Haila Wang, Chinese Word Segmentation and Name Entity Recognition Based on Condition Random Fields [C]//The Sixth SIGHAN Workshop on Chinese Language Processing (SIGHAN-6), Hyderabad, India. 2008.
  • 6R.H. Byrd, J. Nocedal and R. B. Schnabel. Representations of quasi-Newton matrices and their use in limited memory methods [J]. Mathematical Programming, 1994,(63): 129-156.

同被引文献57

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部