摘要
无监督技术通常依靠数据的概率密度分布来检测异常数据,在该类异常监测模型中,具有低概率密度的对象被认为是异常对象。然而,对高维数据的密度分布建模是困难的,这使得从高维数据中检测异常数据的问题变得极具挑战性。最先进的方法被称为‘两步走’框架,该框架首先对数据应用降维技术进行降维,然后在低维空间进行异常检测来解决此问题。不幸的是,低维空间不一定保留原始高维数据的密度分布,这损害了异常检测的有效性。在这项工作中,本文提出了一种新颖的高维数据异常检测方法,称为AEDE (AutoEncoding kernel Density Estimation model)。核心思想是结合核密度估计(KDE)的密度估计能力和深度自编码器的表示学习能力,以便可以学习能够有效分离异常数据的概率密度分布。通过在自编码器的训练过程中使用概率密度策略,AEDE成功地整合了两部分的优势,即深度自编码器和概率密度模型。本文使用四个公开数据集进行的实验表明,在检测异常方面,AEDE模型明显优于最新方法,F1得分提高了30%。
Unsupervised techniques typically rely on the probability density distribution of the data to detect anomalies, where objects with low probability density are considered to be abnormal. However, modeling the density distribution of high dimensional data is known to be hard, making the problem of detecting anomalies from high-dimensional data challenging. The state-of-the-art methods solve this problem by first applying dimension reduction techniques to the data and then detecting anomalies in the low dimensional space. Unfortunately, the low dimensional space does not necessarily preserve the density distribution of the original high dimensional data. This jeopardizes the effectiveness of anomaly detection. In this work, we propose a novel high dimensional anomaly detection method called AEDE. The key idea is to unify the representation learning capacity of deep autoencoder with the density estimation power of kernel density estimation (Auto Encoding kernel Density Estimation model, KDE) such that a probability density distribution of the high dimensional data can be learned that is able to effectively separate the anomalies out. AEDE successfully consolidates the merits of the two worlds, namely variational autoencoder and KDE by using a probability density-aware strategy in the training process of the autoencoder. Our extensive experiments using four benchmark datasets demonstrate that our method significantly outperforms the state-of-the-art methods in detecting anomalies, achieves up to 30% improvement in F1 score.
出处
《计算机科学与应用》
2021年第3期682-689,共8页
Computer Science and Application