期刊文献+

字典与统计相结合的中文分词方法 被引量:41

Chinese Word Segmentation Based on Dictionary and Statistics
下载PDF
导出
摘要 提出了一种字典与统计相结合的中文分词方法,该方法首先利用基于字典的分词方法进行第一步处理,然后利用统计的方法处理第一步所产生的歧义问题和未登录词问题.本算法在基于字典的处理过程中,通过改进字典的存储结构,提高了字典匹配的速度;在基于统计的处理过程中,通过统计和规则相结合的方法提高了交集型歧义切分的准确率,并且一定条件下解决了语境中高频未登录词问题,实验结果表明,由本文算法实现的分词系统DS fenc i的分全率达99.52%,准确率达98.52%. Proposed a method based on dictionary integrated with statistics. The method uses the segmentation method based on dictionary in the first step and then employs segmentation based on statistics to resolve ambiguity and unregistered words left over in the first step. An improved data structure of dictionary is employed to accelerate dictionary looking up speed in the first step, and during the second step, statistics integrated with rules is adopted in order to improve accuracy of crossing ambiguity division and to deal with the unregistered words. The integrity of Dsfenci System which is realized on the method proposed by this paper is 99.52%, the accuracy is 98.52%.
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第9期1766-1771,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60373099)资助.
关键词 中文分词 基于字典的分词 基于统计的分词 交集型分词歧义 chinese word segmentation chinese word segmentation based on dictionary chinese word segmentation based on statistical method crossing ambiguities in chinese word segmentation
  • 相关文献

参考文献6

二级参考文献15

共引文献285

同被引文献280

引证文献41

二级引证文献277

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部