期刊文献+

基于词的关联特征的中文分词方法 被引量:6

Chinese Word Segmentation Algorithm based on Word Association Characteristics
下载PDF
导出
摘要 汉语自动分词是汉语信息处理的前提。在总结和分析常用中文分词方法中发现,基于词频统计的中文分词方法受语料库的限制会出现部分真正的词,但它的可信度小而被忽略,而一些不是词的可信度太大会误判成词。因此,在此分词方法的基础上,提出了一种基于词的关联特征的中文分词方法。算法首先在中文文档中统计出可能成词的文本片段的词频,其次计算出文本片段的自由度和凝合度,最后提出了三元词和四元词过滤方法。实验证明,算法能够提高分词精度。 Chinese automatic word segmentation is the precondition of Chinese information processing. In summarizing and analyzing the commonly-used Chinese word segmentation methods, the Chinese word segmentation method based on word frequency statistics is limited by the corpus, and some real words would be ignored because of their small credibility, and however, some parts that are not words would be misjudged as words for their too much credibility. Therefore, on the basis of this word segmentation method, a Chinese word segmentation algorithm based on word association characteristics is proposed. Firstly, the algorithm counts the word frequency of a text segment that may be a word in a Chinese document. Then, the degree of freedom and coagulation of the text segment is calculated. Finally, the ternary word and quaternary word filtering methods are proposed. Experiments indicate that the proposed algorithm could improve the accuracy of word segmentation.
作者 李康康 龙华 LI Kang-kang;LONG Hua(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650500,China)
出处 《通信技术》 2018年第10期2343-2349,共7页 Communications Technology
关键词 信息处理 中文分词 自由度 过滤方法 information processing Chinese word segmentation degree of freedom filtering method
  • 相关文献

参考文献4

二级参考文献17

共引文献328

同被引文献63

引证文献6

二级引证文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部