K-nearest neighbor(KNN)is one of the most fundamental methods for unsupervised outlier detection because of its various advantages,e.g.,ease of use and relatively high accuracy.Currently,most data analytic tasks need ...K-nearest neighbor(KNN)is one of the most fundamental methods for unsupervised outlier detection because of its various advantages,e.g.,ease of use and relatively high accuracy.Currently,most data analytic tasks need to deal with high-dimensional data,and the KNN-based methods often fail due to“the curse of dimensionality”.AutoEncoder-based methods have recently been introduced to use reconstruction errors for outlier detection on high-dimensional data,but the direct use of AutoEncoder typically does not preserve the data proximity relationships well for outlier detection.In this study,we propose to combine KNN with AutoEncoder for outlier detection.First,we propose the Nearest Neighbor AutoEncoder(NNAE)by persevering the original data proximity in a much lower dimension that is more suitable for performing KNN.Second,we propose the K-nearest reconstruction neighbors(K NRNs)by incorporating the reconstruction errors of NNAE with the K-distances of KNN to detect outliers.Third,we develop a method to automatically choose better parameters for optimizing the structure of NNAE.Finally,using five real-world datasets,we experimentally show that our proposed approach NNAE+K NRN is much better than existing methods,i.e.,KNN,Isolation Forest,a traditional AutoEncoder using reconstruction errors(AutoEncoder-RE),and Robust AutoEncoder.展开更多
基金supported in part by the National Natural Science Foundation of China under Grant Nos.61925203 and U22B2021.
文摘K-nearest neighbor(KNN)is one of the most fundamental methods for unsupervised outlier detection because of its various advantages,e.g.,ease of use and relatively high accuracy.Currently,most data analytic tasks need to deal with high-dimensional data,and the KNN-based methods often fail due to“the curse of dimensionality”.AutoEncoder-based methods have recently been introduced to use reconstruction errors for outlier detection on high-dimensional data,but the direct use of AutoEncoder typically does not preserve the data proximity relationships well for outlier detection.In this study,we propose to combine KNN with AutoEncoder for outlier detection.First,we propose the Nearest Neighbor AutoEncoder(NNAE)by persevering the original data proximity in a much lower dimension that is more suitable for performing KNN.Second,we propose the K-nearest reconstruction neighbors(K NRNs)by incorporating the reconstruction errors of NNAE with the K-distances of KNN to detect outliers.Third,we develop a method to automatically choose better parameters for optimizing the structure of NNAE.Finally,using five real-world datasets,we experimentally show that our proposed approach NNAE+K NRN is much better than existing methods,i.e.,KNN,Isolation Forest,a traditional AutoEncoder using reconstruction errors(AutoEncoder-RE),and Robust AutoEncoder.