摘要
文章对垃圾邮件过滤中的特征选择问题进行了研究,引入"词共现模型"考虑词语之间的语义联系信息,和传统的信息增益特征选择方法结合表示邮件,采用神经网络方法对邮件进行分类得到垃圾邮件过滤器。实验表明,文章提出的将词共现对和信息增益结合的特征选择方法能够提高垃圾邮件过滤的精确度。
Feature selection for spam filtering is researched in this paper. The word co-occurrence model is introduced to analyze the semantic relation between phrases. Features representing emails are selected by word co-occurrence and information gain. The neural network is used to classify emails and construct the spare filter. The experiments show that the precision of spare filtering is increased by feature selection which combines word co-occurrence and information gain.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2009年第12期1863-1866,共4页
Journal of Hefei University of Technology:Natural Science
关键词
垃圾邮件过滤
信息增益
词共现模型
神经网络
交叉覆盖算法
spare filtering
information gain
word co-occurrence model
neural network
crossover algorithm