期刊文献+

基于非平衡数据处理方法的网络在线广告中点击欺诈检测的研究 被引量:4

Study on Click Fraud Detection in Online Advertising with Imbalanced Data Processing Methods
下载PDF
导出
摘要 网络在线广告中以套取广告费为目的的点击欺诈检测是机器学习应用的重要内容之一。支持向量机(Support Vector Machine,SVM)是一种优秀的解决二分类和回归问题的机器学习算法,但应用于网络在线广告中的欺诈点击检测时,由于数据集的极端非平衡性,算法性能受到极大的限制。从FDMA2012竞赛欺诈发布商检测的真实数据集出发,在详细研究与对比了3种非平衡数据处理方法后,选取最佳的混合采样方法对原始数据进行处理,再将其应用于SVM分类器。实验结果表明,所提方法能够有效识别实施欺诈点击行为的非法发布商,准确度达到95%左右,满足了网络在线广告中点击欺诈检测的要求。 Click fraud detection in online advertising is one of the most important applications of machine learning.Support vector machine(SVM)is a prominent supervised machine learning algorithm on classification problems with roughly equal distributions datasets.However,when applied to click fraud detection problems,the success of SVM is greatly limited due to the extreme imbalanced distribution of FDMA2012 competition dataset.In this paper,three data preprocess methods,random under-sample(RUS),synthetic minority over-sampling technique(SMOTE)and SMOTE+edited nearest neighbor(ENN),were detailed investigated,followed by SVM classifier to solve the question.Results show that the method combining SMOTE+ENN with SVM achieves accuracy about 95% on minority samples,which basically reaches the requirements of online advertising click fraud detection system.
作者 李鑫 郭汉 张欣 胡方强 帅仁俊 LI Xin ,GUO Han, ZHANG Xin ,HU Fang- qiang, SHUAI Ren -jun(College of Computer Science and Technology, Nanjing Tech University,Nanjing 21181G, Chin)
出处 《计算机科学》 CSCD 北大核心 2018年第B06期371-374,共4页 Computer Science
基金 国家自然科学基金资助项目(61672279) 江苏省重点研发计划项目(BE2015697)资助
关键词 点击欺诈 支持向量机 非平衡 混合采样 Click fraud SVM lmbalanced Mixed sampling
  • 相关文献

参考文献3

二级参考文献24

  • 1张义荣,鲜明,肖顺平,王国玉.一种基于粗糙集属性约简的支持向量异常入侵检测方法[J].计算机科学,2006,33(6):64-68. 被引量:20
  • 2牛强,王志晓,陈岱,夏士雄.基于SVM的中文网页分类方法的研究[J].计算机工程与设计,2007,28(8):1893-1895. 被引量:22
  • 3Guyon I, Weston J, Barnhill S, et al. Gene Selection for Cancer Classification Using Support Vector Machines [J]. Machine Learning, 2002,46(1-3) : 389-422.
  • 4Golub T R, Slonim D K, Tarnayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999,286(5439) : 531-537.
  • 5Lee C P, Leu Y. A novel hybrid feature selection method for microarray data analysis[J]. Applied Soft Computing, 2011, 11 (1) :208-213.
  • 6Patiwal K K, Sharma A. Improved direct LDA and its application to DNA microarray gene expression data[J]. Pattern Recognition letters, 2010,31 (16) : 2489-2492.
  • 7Liu X Y, Wu J X, Zhou Z H. Exploratory Under-sampling for Class-Imbalance Learning[C]//Proeeedings of the Sixth International Conference on Data Mining. Hongkong: IEEE Press, 2006 : 965-969.
  • 8Yang K, Cai Z, Li J, et al. A stable gene selection in microarray data analysis[J]. BMC Bioinformatics, 2006,7 : 228.
  • 9Li G Z, Meng H H, Ni J. Embedded Gene Selection for Imbalaneed Mieroarray Data Analysis[C]//Proceedings of Third International Multi-symposiums on Computer and Computational Sciences. Shanghai: IEEE Press, 2008 : 17-24.
  • 10Kamal A H M, Zhu X Q, Narayanan R. Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions [C] // Proceedings of 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing. Shanghai: IEEE Press, 2009 : 3-9.

共引文献94

同被引文献30

引证文献4

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部