摘要
文章介绍了一个基于文本分类技术识别垃圾邮件系统的体系结构,并介绍了该系统涉及到的中文信息处理、文本特征选取、朴素贝叶斯分类器等关键技术。最后,文章给出了针对部分垃圾邮件的处理结果。结果表明,该方法对于垃圾邮件的识别,具有较好的效果。
This paper mainly describes the architecture of the recognition system of spam based on text classification technology. In addition, some related key technologies, such as Chinese information processing, text characteristic choicing and classifier of native Bayes are introduced. Finally, this paper provide the result of the processing on part of Spam. It shows that this method is quite useful for identifying mail Spam.
出处
《微电子学与计算机》
CSCD
北大核心
2004年第6期145-146,193,共3页
Microelectronics & Computer
基金
国家自然科学基金项目(59937150)
国家863计划项目(2001AA413910)
关键词
垃圾邮件
文本分类
汉语切词
朴素贝叶斯
Spam, Text Classification, Word Segmentation of Chinese, Native Bayes