摘要
在大型和高维数据上进行有效检测,在实际应用中具有重要意义.异常点检测是指识别出偏离一般数据分布的数据点,其核心是密度估计.尽管像深度自编码高斯混合模型通过先降低维度,再进行密度估计已经取得了重大进展,但是它对低维潜在空间引入噪声,并且在对密度估计模块优化时存在一些限制,例如需要保证协方差是正定矩阵.为解决这些限制,本文提出一种用于无监督异常检测的深度自编码标准化流(deep autoencoder normalizing flow,DANF).该模型利用深度自编码器为每个输入样本生成低维潜在空间表示和重构误差,进而将其输入标准化流(normalizing flow,NF),最终映射成高斯分布.在多个公开的基准数据集上的实验结果表明,深度自编码标准化流模型显著优于最先进的异常检测技术,在评估指标F1-score上最高提升26.43%.
Detecting outliers is crucial for practical applications in large and high-dimensional datasets.Outlier detection is the process of identifying data points that deviate from the typical data distribution.This process primarily involves density estimation.Substantial advancements are achieved by models like the deep autoencoder Gaussian mixture model,which initially reduces dimensionality and subsequently estimates density.However,it introduces noise into the lowdimensional latent space and faces limitations in optimizing the density estimation module,such as the requirement to ensure positive definiteness of the covariance matrix.To overcome these constraints,this study introduces the deep autoencoder normalizing flow(DANF) for unsupervised outlier detection.The model employs deep autoencoders to produce low-dimensional latent space representations and reconstruction errors for individual input samples.These outputs are subsequently fed into a normalizing flow(NF) for transformation into a Gaussian distribution.Experimental results on several widely recognized benchmark datasets reveal that the DANF model consistently surpasses state-of-theart outlier detection methods.The most notable improvement is a remarkable 26.43% increase in the F1-score evaluation metric.
作者
钟海鑫
王晖
郭躬德
ZHONG Hai-Xin;WANG Hui;GUO Gong-De(College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117,China;School of Electronics,Electrical Engineering and Computer Science,Queen’s University Belfast,Belfast BT95BN,UK)
出处
《计算机系统应用》
2024年第3期34-42,共9页
Computer Systems & Applications
基金
国家自然科学基金(61976053,62171131)。
关键词
异常检测
无监督学习
标准化流
可逆变换
密度估计
outlier detection
unsupervised learning
normalizing flow(NF)
invertible transform
density estimation