摘要
为了提高中文垃圾邮件预处理阶段的性能,加快查找分词的速度,基于哈希函数的算法思想创造性的构造了索引词典,设计了一种针对中文垃圾邮件的中文索引分词方法。通过实验,表明该方法提高了传统机械分词法的效率和准确率,改善了邮件预处理阶段的性能,并且可以广泛地应用于中文分词领域。
To improve the preprocessing performance for anfi-spam and to search for phrases more efficiently, this paper creatively constructed an indexing dictionary based on hash algorithm, and designed a method of Chinese phrase segmentation based on this indexing dictionary aiming at anfi-Chinese-spam. Through the study of the experimental data, this method is proved to be more efficient and accurate compared with the traditional mechanical classification, and it does improve the preproeessing performance and can be widely utilized in the field of Chinese phrase segmentation.
出处
《计算机应用》
CSCD
北大核心
2007年第9期2334-2336,共3页
journal of Computer Applications
关键词
反垃圾邮件
中文分词
哈希函数
anti-spare
Chinese phrase segmentation
hash algorithm