摘要
提出了一种新的汉语统计模型CNET,在此基础上提出了一种汉语无词典自动分词算法。该算法首先学习要进行处理的汉语语料,构建CNET,然后根据学习到的知识再去对原始语料进行分词。实验结果表明,该算法分词正确率在70%以上。
A new static model CNET for Chinese language is proposed, and an automatic word segmentation algorithm without dictionary based on CNET is also presented. This method first learns from the context to be processed, and then starts to segment the words with the static information of CNET. Preliminary experiments show that the correctness of this method is about 77%.
出处
《计算机应用与软件》
CSCD
北大核心
2007年第10期219-221,共3页
Computer Applications and Software
关键词
汉语自动分词
互信息
CNET
Automatic Chinese word segmentation Mutual information CNET