期刊文献+

基于动态规划的最小代价路径汉语自动分词 被引量:5

Chinese Word Segmentation Using Minimal Cost Path Algorithm Based on Dynamic Programming
下载PDF
导出
摘要 基于最长次长匹配的方法建立汉语切分路径有向图,将汉语自动分词转换为在有向图中选择正确的切分路径,其中有向图中的节点代价对应单词频度,而边代价对应所连接的两个单词的接续频度;运用改进后Dijkstra最小代价路径算法,求出有向图中路径代价最小的切分路径作为切分结果.在切分歧义的处理上采用分步过滤逐步解消的方法,并引入了基于未知词特征词驱动的机制,对未知词进行了前处理,减少了因未知词的出现而导致的切分错误.实验结果表明,该方法有效地提高了汉语分词的精确率和召回率. The Chinese word segmentation is transformed into a best segmentation path selecting problem in a directed graph based on the maximum and second-maximum matching method. Dijkstra's algorithm is modified to choose the minimum cost path from the directed graph, of which the node cost corresponds to the single-word frequency and the edge cost to the doublewords frequency. Word segmentation ambiguities are filtered and solved step by step. The unknown-word-characteristic-driven mechanism is adopted to handle the unknown word problem. The results show marked improvement in the efficiency of segmentation,and high accuracy rate and recall rate are guaranteed.
作者 孙晓 黄德根
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第3期516-519,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60373095)资助.
关键词 汉语自动分词 最长次长匹配 最小代价路径 切分歧义消解 未知词特征词 chinese word segmentation maximum and second-maximum matching minimum cost path ambiguity partition unknown words characteristic
  • 相关文献

参考文献12

  • 1Sproat R,Shih C.L,et al.A stochastic finite-state word segmentation algorithm for Chinese[J].Computational Linguistics,1996,22(3):377-404.
  • 2Lai B.Y,Sun M.S,et al.Chinese word segmentation and part-of-speechtagging in one step[C].In:Proceedings of International Conference:1997 Research on Computational Linguistics,1997,229-236.Taipei.
  • 3Fan,C.K,Tsai W.H.Automatic word identification in Chinese sentences by the relaxation technique[J].Computer Processing of Chinese and Oriental Languages 1988.4(1):33-56.
  • 4Palmer,D.D.A trainable rule-based Algorithm for word segmentation[C].In:Proceedings of the 35th Annual Meeting of ACL and 8th Conference of the European Chapter of ACL.Madrid,1997.
  • 5Richard Sproat,Thomas Emerson.The First International Chinese Word Segmentation Bakeoff[C].First SIGHAN Workshop attached with the ACL2003,2003.(7),133-143.
  • 6黄德根,朱和合,王昆仑,杨元生,钟万勰.基于最长次长匹配的汉语自动分词[J].大连理工大学学报,1999,39(6):831-835. 被引量:14
  • 7黄德根 朱和合 杨元生.基于单词与双词可信度的汉语分词[J].计算机研究与发展,2001,(7):132-135.
  • 8梁南元.书面汉语自动分词系统—CDWS[J].中文信息学报,1987,(2):44-52.
  • 9黄昌宁.中文信息处理中的分词问题[J].语言文字应用,1997(1):74-80. 被引量:83
  • 10黄昌宁.消歧需要词例知识[C]..自然语言处理学术研讨会[C].,2003:7..

二级参考文献64

共引文献245

同被引文献67

引证文献5

二级引证文献150

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部