期刊文献+

基于IMI-WNB算法的垃圾邮件过滤技术研究 被引量:3

Research on Spam Filtering Technology Based on IMI-WNB Algorithm
下载PDF
导出
摘要 互信息和朴素贝叶斯算法应用于垃圾邮件过滤时,存在特征冗余和独立性假设不成立的问题。为此,提出一种改进互信息的加权朴素贝叶斯算法。针对互信息效率较低的问题,通过引入词频因子与类间差异因子,提出一种改进的互信息特征选择算法,从而实现更高效的特征降维。针对朴素贝叶斯分类算法的独立性假设问题,在朴素贝叶斯分类时使用改进互信息值进行特征加权,消除部分朴素贝叶斯条件独立性假设对邮件分类的不利影响。实验结果表明,相比传统朴素贝叶斯算法,该算法提高了垃圾邮件过滤的精确度、召回率与稳定性。 The application of Mutual Information(MI)and Naive Bayes(NB)algorithm to spam filtering is faced with feature redundancy and invalid independence assumption.To address the problem,this paper proposes an Improved Mutual Information-Weighted Naive Bayes(IMI-WNB)algorithm.As for the low efficiency of mutual information,an improved feature selection algorithm based on MI is proposed by introducing the word frequency factor and inter-class difference factor in order to achieve more efficient feature dimensionality reduction.To solve the problem of independence assumption of NB classification algorithm,the Improved Mutual Information(IMI)value is used for feature weighting in NB classification,which eliminates the adverse effect of part of the NB conditional independence assumption on mail classification.The experimental results show that compared with the traditional NB algorithm,the proposed algorithm improves the accuracy,recall rate and stability of spam filtering.
作者 刘洁 王铮 王辉 LIU Jie;WANG Zheng;WANG Hui(School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo,Henan 454000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第12期299-304,312,共7页 Computer Engineering
基金 国家自然科学基金(61300216)。
关键词 互信息 垃圾邮件过滤 加权朴素贝叶斯算法 特征选择 词频 Mutual Information(MI) spam filtering Weighted Naive Bayes(WNB)algorithm feature selection word frequency
  • 相关文献

参考文献7

二级参考文献32

共引文献82

同被引文献11

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部