摘要
为了解决垃圾邮件过滤问题,考虑到中文垃圾邮件的特点和过滤系统的效率要求,应用生物信息化技术中模式提取算法TE IRES IA S的原理,设计了基于生物序列模式提取技术的垃圾邮件过滤算法B ioM atrix,并实现了基于此算法的中英文邮件过滤系统。过滤系统由数量控制过滤提供垃圾邮件训练集,通过提取其中的特征模式对邮件进行分类,可以识别出约94.2%的垃圾邮件,误过滤率约0.04%。与B ayes过滤算法对比的实验结果表明,将生物序列模式提取技术应用于邮件过滤具有较好的研究和实用价值。
A spam filtering algorithm, BioMatrix was designed in view of the characteristics of Chinese spam and efficiency demand of anti-spam system to solve the spam filtering problem. Based on the pattern discovery techniques of biological sequences, BioMatrix adopted the principle of TEIRESIAS algorithm in bioinformatics technology. An anti-spam system based on BioMatrix was implemented to filter Chinese and English spam. This system obtained training data set by spam quantity control, and then classified mails u...
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2005年第S1期1734-1737,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家"九七三"子课题"下一代互联网安全监测和安全生态学理论研究"(2003CB314800)