摘要
垃圾邮件自身的特点决定了消极学习型的文本分类算法更加适合于垃圾邮件过滤问题.但是,以k-NN为代表的消极型文本分类算法却存在着运行效率偏低等诸多缺点,不便于实际使用.为此,该文在向量余弦相似性公式的基础上,提出了一种新的"嵌入式特征选择垃圾邮件过滤模型"和基于此模型的消极学习型垃圾邮件过滤算法.与一些经典算法相比,新算法在显著降低运算开销的同时,巧妙地避免了由此而引起的信息丢失问题,因而在性能与效率两个方面都有明显提高,具有非常高的实际价值.
Although being more suitable than Eager Learning Text Categorization Approaches for spam filtering, Lazy Learning approaches are generally in lower efficiency. Moreover, they always need Feature Selection process to reduce dimensionality of feature space. This process will cause information losing to have side-effect on the whole performance of approaches. So the paper issued a new spam filtering model based on Embedded Feature Selection Mode, which can reduce dimensionality of feature space greatly without any information losing, the approach based on this model thus can improve both efficiency and performance greatly.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第8期1616-1620,共5页
Journal of Chinese Computer Systems
基金
国家"八六三"高技术研究发展计划基金项目(2006AA01Z455)资助