摘要
针对传统离群点检测算法在类极度不平衡的高维数据集中难以学习离群点的分布模式,导致检测率低的问题,提出了一种生成对抗网络(generative adversarial network,GAN)与变分自编码器(variational auto-encoder,VAE)结合的GAN-VAE算法。算法首先将离群点输入VAE训练,学习离群点的分布模式;然后将VAE与GAN结合训练,生成更多潜在离群点,同时学习正常点与离群点的分类边界;最后将测试数据输入训练后的GAN-VAE,根据正常点与离群点相对密度的差异性计算每个对象的离群值,将离群值高的对象判定为离群点。在四个真实数据集上与六个离群点检测算法进行对比实验,结果表明GAN-VAE在AUC、准确率和F;值上平均提高了5.64%、5.99%和13.30%,证明GAN-VAE算法是有效可行的。
Traditional outlier detection algorithms are difficult to learn the distribution pattern of outlier in extremely unba-lanced high-dimensional datasets,resultingly in low detection rates.This paper proposed a method named GAN-VAE,which combined GAN and VAE.The algorithm firstly input the outliers into VAE to learn the distribution pattern of the outliers,then combined VAE and GAN training to generate more potential outliers and learnt the classification boundary of inliers and out-liers.Finally,it input test data into the trained GAN-VAE,and calculated outliers scores according to the difference of relative density between inliers and outliers,and outliers were the objects with high outlier scores.Compared GAN-VAE performance with six state-of-art outlier detection algorithms on four real world datasets,the results show that the AUC,accuracy and F;value of GAN-VAE have increased by 5.64%,5.99%and 13.30%on average,which proves that GAN-VAE is effective.
作者
金利娜
于炯
杜旭升
王松
Jin Lina;Yu Jiong;Du Xusheng;Wang Song(College of Information Science&Engineer(School of Cyber Science&Engineer),Xinjiang University,Urumqi 830008,China;School of Software,Xinjiang University,Urumqi 830008,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第3期774-779,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61862060,61462079,61562086)。
关键词
数据挖掘
离群点检测
生成对抗网络
变分自编码器
data mining
outlier detection
generative adversarial network
variational auto-encoder