邮件网络协同过滤机制研究被引量：4

Spam Collaborative Filtering in Enron E-mail Network

下载PDF

导出

摘要基于Enron邮件集合探索真实邮件网络,揭示出邮件网络的无标度特性和有限小世界特性.在此基础上,依据用户间交互强度设计出垃圾邮件协同过滤机制,通过调整参数λ,用户可以决定主要是依靠自己还是其他用户协同进行垃圾信息过滤.算法即使在没有对用户个人阅读习惯充分训练的情况下,也可以通过基于交互强度的网络协同方式实现良好过滤.同时为了解决Enron数据集缺乏标注的情况,基于训练样本集W和测试样本集T独立同分布的假设,利用改进的EM(Expectation maximization)算法最小化W∪T集合上风险函数,给出了未知样本的一个良好标注.真实数据上的实验表明,同单机过滤和集成过滤方法相比,协同过滤能够提高平均过滤精度且方法简单易行. A Social network analysis in Enron corpus found that the real e-mail network was a scale-tree ana small woma in some degree. Then a spam collaborative filtering method was designed based on users＇interaction. By adjusting the parameter λ, users can decide filtering spam by themselves or others or trade-off between them. Even in the absence of reading habits of users, the collaborative filtering method could achieve good performance. Because the Enron corpus was unlabeled, by adding i.i.d, assumption constraint to training data set W and test data set T, we labeled Enron corpus using improved EM （Expectation maximization） algorithm in a sense of minimum statistical risk in W U T. Experiment results showed that the collaborative filtering method is simple and effective which can steadily increase average accuracy compared with single machine and ensemble filterings.

作者杨震赖英旭段立娟李玉鑑许昕

机构地区北京工业大学计算机学院

出处《自动化学报》 EI CSCD 北大核心 2012年第3期399-411,共13页 Acta Automatica Sinica

基金国家自然科学基金(61001178 60905017 61175115) 国家软科学研究计划项目(2010GXQ5D317) 北京市自然科学基金(4102012 4112009 4102013 4123093) 北京市教育委员会科技发展计划面上项目(KM201210005024) 北京市教育委员会科技发展计划重点项目(KZ201210005007) 北京市高等学校人才强教深化计划"中青年骨干人才培养计划"项目(PHR201108016) 北京工业大学高层人才培养项目北京工业大学校青基金资助~~

关键词文本分类邮件过滤邮件网络协同过滤 Text classification, spam filtering, e-mail network, collaborative filtering

分类号 TP393.098 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献36

1Hambridge S,Lunde A.DON T SPEW—a set of guidelines for mass unsolicited mailings and postings(spam*) [Online],available:http://www.ietf.org/rfc/rfc2635.txt,January6,2012.
2罗浩,方滨兴,唐剑琪.垃圾邮件问题及其处理方法[J].电信科学,2006,22(2):48-52. 被引量：2
3Cormack G,Lynam T.TREC2005spam track overview.In:Proceedings of the14th Text Retrieval Conference.Mary-land,USA:NIST Special Publication,2005.1-17.
4中国互联网协会反垃圾邮件(信息)中心.2010年第二季度中国反垃圾邮件状况调查报告 [Online],available:http://anti-spam.cn/pdf/201002report.pdf,2010.
5Sahami M,Dumais S,Heckerman D,Horvitz E.A Bayesian approach to filtering junk e-mail[Online],avail- able:http://robotics.stanford.edu/users/sahami/papers-dir/spam.pdf,January10,2012.
6Drucker H,Wu D H,Vapnik V N.Support vector machines for spam categorization.IEEE Transactions on Neural Net-works,1999,10 (5):1048-1054.
7Delany S J,Cunningham P.An analysis of case-base editing in a spam filtering system.In:Proceedings of the7th Eu-ropean Conference on Advances in Case-based Reasoning.Madrid,Spain:Springer,2004.128-141.
8Xu H,Yu B.Automatic thesaurus construction for spam filtering using revised back propagation neural network.Ex-pert Systems with Applications,2010,37(1):18-23.
9Eyharabide V,Amandi A.Semantic spam filtering from personalized ontologies.Journal of Web Engineering,2008,7(2):158-176.
10Tich P T,Nquyen T T,Tsai P,Kong X Y.BSPNN:boosted subspace probabilistic neural network for email security.Ar-tificial Intelligence Review,2011,35(4):369-382.

二级参考文献84

1Yu Jiang,Bin-Xing Fang,Ming-Zeng Hu,Xiang Cui.Techniques for Determining the Geographic Location of IP Addresses in ISP Topology Measurement[J].Journal of Computer Science & Technology,2005,20(5):689-701. 被引量：2
2[1]Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
3[2]Stitson MO, Weston JAE, Gammerman A, Vovk V, Vapnik V. Theory of support vector machines. Technical Report, CSD-TR-96-17, Computational Intelligence Group, Royal Holloway: University of London, 1996.
4[3]Cortes C, Vapnik V. Support vector networks. Machine Learning, 1995,20:273～297.
5[4]Vapnik V. Statistical Learning Theory. John Wiley and Sons, 1998.
6[5]Gammerman A, Vapnik V, Vowk V. Learning by transduction. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. Wisconsin, 1998. 148～156.
7[6]Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning (ICML). San Francisco: Morgan Kaufmann Publishers, 1999. 200～209.
8[7]Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Haussler D, ed. Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory. Pittsburgh, PA: ACM Press, 1992. 144～152.
9[8]Burges CJC. Simplified support vector decision rules. In: Saitta L, ed. Proceedings of the 13th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers, 1996. 71～77.
10[9]Osuna E, Freund R, Girosi F. An improved training algorithm for support vector machines. In: Proceedings of the IEEE NNSP'97. Amelia Island, FL, 1997. 276～285.

共引文献163

1柏文洁,汪秉宏,周涛.从复杂网络的观点看大停电事故[J].复杂系统与复杂性科学,2005,2(3):29-37. 被引量：33
2马建斌,滕桂法,李滢,赵洋.TSVM在电子邮件作者身份分类技术上的应用[J].河北工业大学学报,2005,34(z1):124-126.
3李洋,方滨兴,郭莉.基于TSVM分类的网络入侵检测方法[J].计算机研究与发展,2007,44(z2):198-202.
4刘建香.复杂网络及其在国内研究进展的综述[J].系统科学学报,2009,17(4):31-37. 被引量：74
5曹首峰,王秀文,王勇.一种基于社会化网络的资源分发模型及其分析[J].计算机研究与发展,2012,49(S2):118-123. 被引量：1
6马琳,罗铁坚,宋进亮,叶世伟.Web性能测试与预测[J].中国科学院研究生院学报,2005,22(4):472-479. 被引量：8
7马琳,罗铁坚,叶世伟.一种基于转导的预测算法及其在软件性能测试中的应用[J].计算机工程,2005,31(16):170-172. 被引量：1
8宇缨,李清华.统计学习理论和支持向量机[J].沈阳大学学报,2005,17(4):42-47. 被引量：14
9王艳辉,吴斌,王柏.电信社群网络静态几何性质分析研究[J].复杂系统与复杂性科学,2005,2(2):54-60. 被引量：5
10汪秉宏,王文旭,周涛.交通流驱动的含权网络[J].物理,2006,35(4):304-310. 被引量：14

同被引文献75

1吴泓辰,王新军,成勇,彭朝晖.基于协同过滤与划分聚类的改进推荐算法[J].计算机研究与发展,2011,48(S3):205-212. 被引量：20
2陈曦,陈华钧,顾珮嵚,张宁豫,陈娇彦,于彤.一种基于Hadoop的语义大数据分布式推理框架[J].计算机研究与发展,2013,50(S2):103-113. 被引量：15
3Sarwar B,Karypis G,Konstan J,et al.Item-based collaborative filtering recommendation algorithms[C]// Proceedings of the 10th International Conference on World Wide Web.2011:285-295.
4MillerJ B N,Ried T,Konstan J A.GroupLens for Usenet:Experiences in applying collaborative filtering to a social information system[M].//From Usenet to CoWebs.Springer,2013:206-231.
5Goldberg K,Roeder T,Gupta D,et al.Eigentaste:A constant time collaborative filtering algorithm[J].Information Retrieval,2001,4(2):133-151.
6Gr M,Ar V C,Fortuna B V Z,et al.kNN versus SVM in the collaborative filtering framework[M].//Data Science and Classification.Springer,2006:251-260.
7Hofmann T.Collaborative filtering via gaussian probabilistic latent semantic analysis[C]//Proceedings of the 26th Anaual In ternational ACM SIGIR Conference on Research and Development in Information Retrieval.2003:259-266.
8Su X,Khoshgoftaar T M.A survey of collaborative filtering techniques[J].Advances in Artificial Intelligence,2009,2009:421-425.
9Wang J,De Vries A P,Reinders M J.Unifying user-based and item-based collaborative filtering approaches by similarity fusion[C]//Proceedings of the 29th Anaual International ACM SIGIR Conference on Research and Development in Information Retrieval.2006:501-508.
10Linden G,Smith B,York J.Amazon.com recommendations:Item-to-item collaborative filtering[J].Internet Computing,IEEE,2003,7(1):76-80.

引证文献4

1王丽萍.基于项相关图的协同过滤算法[J].计算机科学,2014,41(5):280-282. 被引量：1
2李改,李磊.鲁棒的单类协同排序算法[J].自动化学报,2015,41(2):405-418. 被引量：4
3孙雪,韩蕾,李昆仑.基于类别特征选择与反馈学习随机森林算法的邮件过滤系统研究[J].计算机应用与软件,2015,32(4):67-71. 被引量：1
4张玙.基于协同过滤算法的人力资源信息管理系统研究[J].电子设计工程,2017,25(3):23-27. 被引量：3

二级引证文献9

1邓华平.基于项目聚类和评分的时间加权协同过滤算法[J].计算机应用研究,2015,32(7):1966-1969. 被引量：11
2刘汉清,朱敏,苏亚博,唐彬彬.一种考虑用户兴趣转移特征的协同预测模型[J].四川大学学报（自然科学版）,2016,53(3):548-554. 被引量：13
3伊华伟,张付志,巢进波.基于模糊核聚类和支持向量机的鲁棒协同推荐算法[J].电子与信息学报,2017,39(8):1942-1949. 被引量：7
4王建永,方宽,黄慧欣,林俊.大数据背景下企业信息资源整合优化方法研究[J].电子设计工程,2018,26(8):134-137. 被引量：8
5王建永,方宽,黄慧欣,林俊.基于大数据的企业信息资源整合优化方法研究[J].电力大数据,2018,21(9):9-14. 被引量：3
6李改,邹小青.基于隐式反馈的协同过滤算法研究综述[J].福建电脑,2018,34(11):1-5.
7郭伟,王佳伟,唐晓亮,洪倩.基于置信度加权的单类协同过滤推荐算法[J].计算机应用研究,2018,35(12):3618-3623. 被引量：6
8哈金花.基于协同过滤的图书馆文献数据挖掘系统设计[J].机械设计与制造工程,2019,48(10):119-122. 被引量：5
9彭成,展万里,周晓红.基于随机森林的异常邮件检测方法研究与实现[J].湖南工业大学学报,2020,34(1):70-76. 被引量：3

1谷文成,柴宝仁,韩俊松.基于支持向量机的垃圾信息过滤方法[J].北京理工大学学报,2013,33(10):1062-1066. 被引量：7
2亓沂滨.基于向量机主动学习的通信网络入侵检测技术的研究[J].电子技术与软件工程,2013(15):18-19.
3吴雪平,贾瑞清.过滤器过滤精度的选择[J].液压与气动,1995,19(4):9-11.
4黄杰,史啸.一种基于人体裸露皮肤形状的不良图像过滤系统[J].东南大学学报（自然科学版）,2014,44(6):1111-1115. 被引量：2
5陈为龙,李晓宁.一种基于总量风险函数的改进BP算法[J].四川大学学报（自然科学版）,2006,43(5):1023-1026. 被引量：3
6李航,孙锦阳.软件可靠性建模的一致性[J].航空与航天,1998(4):5-7.
7迟学芝,朱晓丽,赵志刚.基于BP人工神经网络的信息过滤技术研究[J].电脑开发与应用,2007,20(6):58-60. 被引量：1
8杨本见,吴海涛.关于测试成本不确定软件模型的风险分析研究[J].计算机与数字工程,2010,38(12):14-18.
9杜文华.信息获取中两类不确定性的风险表达模型研究[J].科技情报开发与经济,2007,17(3):107-109.
10陈葡,谢晓尧,徐洋.基于词序列核的垃圾邮件过滤[J].武汉大学学报（理学版）,2011,57(5):419-423. 被引量：1

自动化学报

2012年第3期

浏览历史

内容加载中请稍等...

邮件网络协同过滤机制研究被引量：4

参考文献36

二级参考文献84

共引文献163

同被引文献75

引证文献4

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

邮件网络协同过滤机制研究 被引量：4

参考文献36

二级参考文献84

共引文献163

同被引文献75

引证文献4

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

邮件网络协同过滤机制研究被引量：4