摘要
针对机器学习模型训练过程中攻击者可以利用修改原始训练数据生成投毒数据的方式对机器学习模型进行投毒攻击的问题,提出一种基于数据复杂度的投毒数据检测方法。该方法在正常数据集的基础上,应用梯度上升策略对正常数据集内的样本实例进行自我投毒,通过挖掘自我投毒产生的投毒数据对正常数据集数据复杂度的影响,训练能够辨别投毒数据的检测模型。该方法在选定应用场景中的检测准确率比现有方法有更好的效果。实验结果表明,投毒数据能够有效降低机器学习模型预测能力,应用基于数据复杂度的检测方法能够有效检测投毒数据,降低投毒数据对模型预测能力的不良影响。
Aiming at the problem that the attacker can modify original training data to generate poisoned data to poison the machine learning model in the process of training the model,this paper proposed a poisoned data detection method based on data complexity.On the basis of the normal data set,the method poisoned the sample instances in the normal data set based on a direct gradient ascent strategy,and exploited the influence of the poisoned data on the data complexity of the normal data set to build a detection model that could identify the poisoned data.The detection accuracy of this method in selected application scenarios was better than the existing method.The experimental results show that the poisoned data can effectively reduce the predictive ability of the machine learning model,and the application of the method based on data complexity can effectively detect the poisoning data and reduce the adverse effects of the poisoned data on the model prediction ability.
作者
亢飞
李建彬
Kang Fei;Li Jianbin(School of Information Science&Engineering,Central South University,Changsha 410083,China;Information Security&Big Data Research Institute,Central South University,Changsha 410083,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第7期2140-2143,共4页
Application Research of Computers
关键词
机器学习
投毒攻击
梯度上升
数据复杂度
machine learning
poisoning attack
gradient ascent
data complexity