摘要
网络在线广告中以套取广告费为目的的点击欺诈检测是机器学习应用的重要内容之一。支持向量机(Support Vector Machine,SVM)是一种优秀的解决二分类和回归问题的机器学习算法,但应用于网络在线广告中的欺诈点击检测时,由于数据集的极端非平衡性,算法性能受到极大的限制。从FDMA2012竞赛欺诈发布商检测的真实数据集出发,在详细研究与对比了3种非平衡数据处理方法后,选取最佳的混合采样方法对原始数据进行处理,再将其应用于SVM分类器。实验结果表明,所提方法能够有效识别实施欺诈点击行为的非法发布商,准确度达到95%左右,满足了网络在线广告中点击欺诈检测的要求。
Click fraud detection in online advertising is one of the most important applications of machine learning.Support vector machine(SVM)is a prominent supervised machine learning algorithm on classification problems with roughly equal distributions datasets.However,when applied to click fraud detection problems,the success of SVM is greatly limited due to the extreme imbalanced distribution of FDMA2012 competition dataset.In this paper,three data preprocess methods,random under-sample(RUS),synthetic minority over-sampling technique(SMOTE)and SMOTE+edited nearest neighbor(ENN),were detailed investigated,followed by SVM classifier to solve the question.Results show that the method combining SMOTE+ENN with SVM achieves accuracy about 95% on minority samples,which basically reaches the requirements of online advertising click fraud detection system.
作者
李鑫
郭汉
张欣
胡方强
帅仁俊
LI Xin ,GUO Han, ZHANG Xin ,HU Fang- qiang, SHUAI Ren -jun(College of Computer Science and Technology, Nanjing Tech University,Nanjing 21181G, Chin)
出处
《计算机科学》
CSCD
北大核心
2018年第B06期371-374,共4页
Computer Science
基金
国家自然科学基金资助项目(61672279)
江苏省重点研发计划项目(BE2015697)资助