期刊文献+

LSA和MD5算法在垃圾邮件过滤系统的应用研究 被引量:3

Research of Spam Filtering System Based on Latent Semantic Analysis and MD5
下载PDF
导出
摘要 随着对垃圾邮件问题的普遍关注,针对目前邮件过滤方法中存在着的语义缺失现象和处理群发型垃圾邮件低效问题,提出一种基于潜在语义分析(LSA)和信息-摘要算法5(MD5)的垃圾邮件过滤模型。利用潜在语义分析标注垃圾邮件中潜在特征词,从而在过滤技术中引入语义分析;利用MD5在LSA分析基础上,对群发型垃圾邮件生成"邮件指纹",解决过滤技术在处理群发型垃圾邮件中低效的问题。结合该模型设计了一个垃圾邮件过滤系统。采用自选数据集对文中设计的系统进行测试评估,经与Nave Bayes算法过滤器进行比较,证明该方法在垃圾邮件过滤上优于Nave Bayes方法,实验结果达到了预期的效果,验证了该方法的可行性、优越性。 Along with the widespread concern of spam problem, at present, there are spam filtering system about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on Latent Semantic Analysis (LSA) and Message-Digest algorithm 5 (MD5). By making use of the LSA marks the latent feature phrase in the spam, a semantic analysis is introduced into the spam filtering technique, the "e-mail fingerprint" of multi-send spam is born with MD5 on the LSA analytical foundation, the problem of filtering technique's low effect in the multi-send spam is resolved with this kind of method. We design a spam filtering system based on this model. This system is evaluated with an optional dataset. The results obtained are compared with Naive Bayes algorithm filter experiment results. The experiments show the expected results, and the feasibility and advantage of the new spam filtering method is validated.
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2007年第6期1223-1227,共5页 Journal of University of Electronic Science and Technology of China
基金 "十一五"国家科技支撑计划(2006BAF01A21)
关键词 邮件指纹 特征提取 潜在语义分析 MD5算法 滑动窗口 垃圾邮件过滤 e-mail fingerprint feature selection latent semantic analysis message-digest algorithm 5 slipping windows spam filtering
  • 相关文献

参考文献18

  • 1中国互联网反垃圾邮件联盟[EB/OL].http://www.anti.spam.org.cn,2007-03-10.
  • 2GUO Y, ZHANG Y, LIU J, et al. Research on the comprehensive anti-spam filter[J]. Industrial Informatics, 2006: (2): 98-100.
  • 3王怡,盖杰,武港山,王继成.基于潜在语义分析的中文文本层次分类技术[J].计算机应用研究,2004,21(8):151-154. 被引量:15
  • 4YEH J Y, KE H R, YANG W P, et al. Text summarization using a trainable summarizer and latent semantic analysis[J]. Information Processing & Management, 2005, 41(1): 75-95.
  • 5JARVINEN K, TOMMISKA M, SKYTTA J. Hardware implementation analysis of the MD5 hash algorithm[C]// Proceedings of the 38th Hawaii International Conference on System Sciences. Hawaii, USA: IEEE Computers Society Press, 2005: 320-322.
  • 6PRENEEL B, VAN O P C. On the security of iterated message authentication codes[J]. Information Theory, 1999, 45(1): 121-123.
  • 7Weizhong Zhu C C. Storylines: Visual exploration and analysis in latent semantic spaces[J]. Computers & Graphics, 2007, 31(3): 78-79.
  • 8MALETIC J I, MARCUS A. Using latent semantic analysis to identify similarities in source code to support program understanding[C]//Tools with Artificial Intelligence. Vancouver: IEEE, 2000: 321-323.
  • 9MARTIN D I, MARTIN J C, BERRY M W, et al. Out- of-core SVD performance for document indexing[J]. Applied Numerical Mathematics, 57(11-12): 1994:224-226.
  • 10盖杰,王怡,武港山.潜在语义分析理论及其应用[J].计算机应用研究,2004,21(3):9-12. 被引量:35

二级参考文献30

  • 1Yang Y,Proceedingsofthe 14thInternationalConferenceonMachineLearning,1997年
  • 2吴立德,大规模中文文本处理,1997年
  • 3姚天顺,自然语言理解,1995年
  • 4Hammad MA, Franklin MJ, Aref WG, Elmagarmid AK. Scheduling for shared window joins over data streams. In: Freytag JC,Lockemann PC, Abiteboul S, eds. Proc. of the 29th Int'l Conf. on Very Large Data Bases. Berlin: Morgan Kaufmann Publishers,2003. 297~308.
  • 5Babcock AK, Babu S, Datar M. Model and issues in data stream systems. In: Popa L, eds. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM, 2002. 1-16.
  • 6Golab L, Ozsu MT. Issues in data stream management. SIGMOD Record, 2003,32(2):5-14.
  • 7Motwani R, Widom J, Arasu A. Query processing, approximation, and resource management in a data stream management system.In: Proc. of the 1 st Biennial Conf. on Innovative Data Syst. Res (CIDR). 2003. http://newdbpubs.stanford.edu/pub/2002-41
  • 8Madden S, Franklin MJ. Fjording the stream: An architecture for queries over streaming sensor data. In: Proc. of the 18th Int'l Conf.on Data Engineering. San Jose: IEEE Computer Society, 2002. 555-566.
  • 9Chandraskearan S, Franklin MJ. Streaming queries over streaming data. In: Bernstein PA, Loannidis YE, Ramakrishnan R, eds.Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong SAR: Morgan Kaufmann Publishers, 2002. 203~214.
  • 10Araru A, Babu S, Widom J. An abstract semantics and concrete language for continuous queries over streams and relations.Technical Report, Stanford University Database Group. 2002. http://dbpubs.stanford.edu/pub/2002-57

共引文献91

同被引文献37

  • 1余正涛,樊孝忠,郭剑毅,耿增民.基于潜在语义分析的汉语问答系统答案提取[J].计算机学报,2006,29(10):1889-1893. 被引量:44
  • 2曾志高,谭骏珊.PGP邮件系统核心算法分析及安全性的改进[J].计算机工程与设计,2007,28(5):1038-1039. 被引量:2
  • 3CROCKER S. RFC1848, MIME object security services[ S/OL]. ( 1995 ). http ://www. cnpaf, net/.
  • 4NA W,GOU Bei. A thermal equivalent circuit for PEM fuel cell temperature control design[ C ]//Proc of International Symposium on Circuits and Systems. 2008:2825-2828.
  • 5GEYLANI K, EBRU C. A smart card mediated mobile platform for secure e-mail communication[ C]//Proc of the 4th International Conference on Information Technology. 2007:925-928.
  • 6ELKINS M. RFC2015, MIME security with pretty good privacy (PGP) [ S/OL]. ( 1996), http://www, cnpaf, net/.
  • 7RAMS D B. RFC2632, S/ MIME version 3 certificate handling [ S/ OL]. (1999). http://www, cnpaf, net/.
  • 8ANDROUTSOPOULOS I, PALIOURAS G, KARKALETSIS V, et al. Learning to filter spam e-mail: a comparison of a naive Bayesian and a memory-based approach[ C]//Proc of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases. Athens: IEEE Press,2000.
  • 9CHA B, KIM K, NA H. Random password generation of OTP system using changed location and angle of fingerprint features [ C ]//Proc of the 8th IEEE International Conference on Computer and Information Technology. 2008:420-425.
  • 10Eastman C M, Bernard J J. Coverage, Relevance, and Ranking Results[J]. ACM Trans on Information Systems(TOIS), 200a the Impact of Query Operators on Web Search Engine 21(4) : 229-253.

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部