摘要
提出了一种新的垃圾邮件过滤方法(NSFM),从高维的文本特征中删除冗余的特征,选择对分类精度提高有贡献的特征,从而提高了垃圾邮件过滤的分类准确率。提出了一种模糊自适应粒子群(IFAPSO),通过模糊控制,动态的调控粒子群的惯性权重、学习因子和粒子数量比。NSFM包含核心特征选择、特征选择、垃圾邮件过滤3个阶段,第一阶段利用信息增益求取每个特征的信息值,构建核心特征集合,生成一定数量的核心特征子集;第二阶段根据核心特征子集对IFAPSO进行初始化,利用模糊控制器对粒子群进行自适应的调节,完成特征选择;第三阶段使用支持向量机对最优的特征子集分类,完成垃圾邮件过滤。本文采用PU1、Ling-Spam、SpamAssassin数据集数,通过多种对比实验证明:本方法自适应性强,可选择到较优的特征子集,有效地提高了分类精度,提升了垃圾邮件过滤的性能,具有较高的实用价值。
A Novel Spam Filtering Method (NSFM) is proposed, which removes redundant attributes from the high dimensional attributes, and selects the attributes, which contribute to the classification accuracy, thus, to improve the classification rate of spare filtering. A fuzzy adaptive particle swarm algorithm is developed, which can dynamically control the inertia weight, learning factor and particle number factor using fuzzy control. The NSFM consists of three stages, kernel feature selection, feature selection and spare filtering. In the first stage, information gain is employed to calculate the infarmation value of each feature, and construct a kernel feature set, thereby obtaining a number of kernel feature subsets. In the second stage, according to the kernel feature subset, IFAPSO is initialized and adjusted adaptively using the fuzzy controller, thus finishing spam filtering. In the final stage, support vector machine is used to classify the optimal feature subset and finish spare filtering. In this paper, PUI, I.ing-Spam and SpamAssassin data sets are utilized. Through many comparative experiments, it is confirmed that the proposed method is adaptable and can select better feature subsets, thereby enhancing the classification accuracy rate effectively, and building up the performance of spare filtering. The NSFM has important practical value.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2011年第3期716-720,共5页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(60971089)
国家电子发展基金项目(财建[2009]537号)
吉林省科技厅项目(20090502)
关键词
人工智能
特征选择
粒子群
模糊控制
垃圾邮件过滤
支持向量机
artificial intelligence
feature selection
particle swarm optimization
fuzzy control
spare filtering
support vector machines