摘要
当前推荐系统的商业价值日益突显,虚假用户检测成为保障推荐系统信息安全的关键.现有方法忽略了虚假用户检测问题的代价敏感特性,为此提出一种基于双重欠采样代价敏感学习的检测算法.首先对数据集进行双重采样均衡样本集,然后设计动态隶属度代价函数精确地刻画个体样本误分类代价差异.最后,建立代价敏感支持向量机得到检测函数.实验结果表明文章方法在降低总体误分类代价的同时提高了虚假用户的识别率,有效地解决了推荐系统虚假用户检测中的代价敏感问题.
Spammer detection is the key to ensure information security in recommendation.The existing methods ignore the adverse effects of cost sensitive features on detection accuracy.A detection method based on double under-sampling and cost sensitive support vector machine is proposed.First,we use double under-sampling technology to balance data sets.The first under-sampling eliminates the noise samples while preserving the useful boundary samples;The second under-sampling compresses the redundant information based on the distribution characteristics and importance of the large class samples.Then,the dynamic function based on class confidence is introduced into the cost-sensitive optimization,and a cost-sensitive support vector machine with different miscalculation costs is established.Finally,we use the model to train the balanced sample set.Experimental results show that the proposed method effectively solves the cost sensitive problem and improves the detection accuracy.
作者
吕成戍
LüChengshu(School of Management Science and Engineering,Dongbei University of Finance and Economics,Dalian 116025)
出处
《系统科学与数学》
CSCD
北大核心
2021年第12期3548-3558,共11页
Journal of Systems Science and Mathematical Sciences
基金
国家自然科学基金资助项目(71602021,71801032,72172025)
2019年辽宁省教育厅科学研究项目(LN2019Q31)
东北财经大学2021年度校级科研项目(DUFE202142)资助课题。
关键词
欠采样
代价敏感学习
虚假用户
信息安全
Under-sampling
cost sensitive learning
spammer
information security