期刊文献+
共找到50篇文章
< 1 2 3 >
每页显示 20 50 100
CLOF Based Outlier Detection Algorithm of Temperature Data for Ethylene Cracking Furnace
1
作者 Yidan Xin Shaolin Hu +1 位作者 Wenzhuo Chen He Song 《Journal of Harbin Institute of Technology(New Series)》 CAS 2023年第4期50-57,共8页
The flue temperature is one of the important indicators to characterize the combustion state of an ethylene cracker furnace,the outliers of temperature data can lead to the false alarm.Conventional outlier detection a... The flue temperature is one of the important indicators to characterize the combustion state of an ethylene cracker furnace,the outliers of temperature data can lead to the false alarm.Conventional outlier detection algorithms such as the Isolation Forest algorithm and 3-sigma principle cannot detect the outliers accurately.In order to improve the detection accuracy and reduce the computational complexity,an outlier detection algorithm for flue temperature data based on the CLOF(Clipping Local Outlier Factor,CLOF)algorithm is proposed.The algorithm preprocesses the normalized data using the cluster pruning algorithm,and realizes the high accuracy and high efficiency outlier detection in the outliers candidate set.Using the flue temperature data of an ethylene cracking furnace in a petrochemical plant,the main parameters of the CLOF algorithm are selected according to the experimental results,and the outlier detection effect of the Isolation Forest algorithm,the 3-sigma principle,the conventional LOF algorithm and the CLOF algorithm are compared and analyzed.The results show that the appropriate clipping coefficient in the CLOF algorithm can significantly improve the detection efficiency and detection accuracy.Compared with the outlier detection results of the Isolation Forest algorithm and 3-sigma principle,the accuracy of the CLOF detection results is increased,and the amount of data calculation is significantly reduced. 展开更多
关键词 temperature data outlier detection ethylene cracker furnace CLUSTERING data clipping LOF
下载PDF
Density-based trajectory outlier detection algorithm 被引量:9
2
作者 Zhipeng Liu Dechang Pi Jinfeng Jiang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第2期335-340,共6页
With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the pr... With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm. 展开更多
关键词 density-based algorithm trajectory outlier detection(TRAOD) partition-and-detect framework Hausdorff distance
下载PDF
GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection 被引量:4
3
作者 LI Kexin LI Jing +3 位作者 LIU Shuji LI Zhao BO Jue LIU Biqi 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2019年第6期1026-1038,共13页
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith... With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method. 展开更多
关键词 outlier detection isolation tree isolated forest genetic algorithm feature selection
下载PDF
Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning 被引量:3
4
作者 Shuo Zheng Yu-Xin Zhu +3 位作者 Dian-Qing Li Zi-Jun Cao Qin-Xuan Deng Kok-Kwang Phoon 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第1期425-439,共15页
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse mult... Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data. 展开更多
关键词 outlier detection Site investigation Sparse multivariate data Mahalanobis distance Resampling by half-means Bayesian machine learning
下载PDF
Outlier Detection for Water Supply Data Based on Joint Auto-Encoder 被引量:2
5
作者 Shu Fang Lei Huang +2 位作者 Yi Wan Weize Sun Jingxin Xu 《Computers, Materials & Continua》 SCIE EI 2020年第7期541-555,共15页
With the development of science and technology,the status of the water environment has received more and more attention.In this paper,we propose a deep learning model,named a Joint Auto-Encoder network,to solve the pr... With the development of science and technology,the status of the water environment has received more and more attention.In this paper,we propose a deep learning model,named a Joint Auto-Encoder network,to solve the problem of outlier detection in water supply data.The Joint Auto-Encoder network first expands the size of training data and extracts the useful features from the input data,and then reconstructs the input data effectively into an output.The outliers are detected based on the network’s reconstruction errors,with a larger reconstruction error indicating a higher rate to be an outlier.For water supply data,there are mainly two types of outliers:outliers with large values and those with values closed to zero.We set two separate thresholds,and,for the reconstruction errors to detect the two types of outliers respectively.The data samples with reconstruction errors exceeding the thresholds are voted to be outliers.The two thresholds can be calculated by the classification confusion matrix and the receiver operating characteristic(ROC)curve.We have also performed comparisons between the Joint Auto-Encoder and the vanilla Auto-Encoder in this paper on both the synthesis data set and the MNIST data set.As a result,our model has proved to outperform the vanilla Auto-Encoder and some other outlier detection approaches with the recall rate of 98.94 percent in water supply data. 展开更多
关键词 Water supply data outlier detection auto-encoder deep learning
下载PDF
Outlier detection algorithm for satellite gravity gradiometry data using wavelet shrinkage de-noising 被引量:1
6
作者 Wu Yunlong Li Hui +2 位作者 Zou Zhengbo Kang Kaixuan Muhammad Sadiq 《Geodesy and Geodynamics》 2012年第2期47-52,共6页
On the: basis of wavelet theory, we propose an outlier-detection algorithm for satellite gravity ometry by applying a wavelet-shrinkage-de-noising method to some simulation data with white noise and ers. The result S... On the: basis of wavelet theory, we propose an outlier-detection algorithm for satellite gravity ometry by applying a wavelet-shrinkage-de-noising method to some simulation data with white noise and ers. The result Shows that this novel algorithm has a 97% success rate in outlier identification and that be efficiently used for pre-processing real satellite gravity gradiometry data. 展开更多
关键词 satellite gravity gradiometry outlier detection wavelet shrinkage THRESHOLD Haar wavelet
下载PDF
A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection
7
作者 Alican Dogan Derya Birant 《Journal of Data and Information Science》 CSCD 2020年第2期111-135,共25页
Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately.To serve this purpose,a novel approach is introduced to determine the likelihood of an ... Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately.To serve this purpose,a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.Design/methodology/approach:This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems.The proposed approach,named Bagged and Voted Local Outlier Detection(BV-LOF),benefits from the Local Outlier Factor(LOF)as the base algorithm and improves its detection rate by using ensemble methods.Findings:Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method.According to the results,the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.Research limitations:In the BV-LOF approach,the base algorithm is applied to each subset data multiple times with different neighborhood sizes(k)in each case and with different ensemble sizes(T).In our study,we have chosen k and T value ranges as[1-100];however,these ranges can be changed according to the dataset handled and to the problem addressed.Practical implications:The proposed method can be applied to the datasets from different domains(i.e.health,finance,manufacturing,etc.)without requiring any prior information.Since the BV-LOF method includes two-level ensemble operations,it may lead to more computational time than single-level ensemble methods;however,this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.Originality/value:The proposed approach(BV-LOF)investigates multiple neighborhood sizes(k),which provides findings of instances with different local densities,and in this way,it provides more likelihood of outlier detection that LOF may neglect.It also brings many benefits such as easy implementation,improved capability,higher applicability,and interpretability. 展开更多
关键词 outlier detection Local outlier factor Ensemble learning BAGGING VOTING
下载PDF
Outlier Detection of Mixed Data Based on Neighborhood Combinatorial Entropy
8
作者 Lina Wang Qixiang Zhang +2 位作者 Xiling Niu Yongjun Ren Jinyue Xia 《Computers, Materials & Continua》 SCIE EI 2021年第11期1765-1781,共17页
Outlier detection is a key research area in data mining technologies,as outlier detection can identify data inconsistent within a data set.Outlier detection aims to find an abnormal data size from a large data size an... Outlier detection is a key research area in data mining technologies,as outlier detection can identify data inconsistent within a data set.Outlier detection aims to find an abnormal data size from a large data size and has been applied in many fields including fraud detection,network intrusion detection,disaster prediction,medical diagnosis,public security,and image processing.While outlier detection has been widely applied in real systems,its effectiveness is challenged by higher dimensions and redundant data attributes,leading to detection errors and complicated calculations.The prevalence of mixed data is a current issue for outlier detection algorithms.An outlier detection method of mixed data based on neighborhood combinatorial entropy is studied to improve outlier detection performance by reducing data dimension using an attribute reduction algorithm.The significance of attributes is determined,and fewer influencing attributes are removed based on neighborhood combinatorial entropy.Outlier detection is conducted using the algorithm of local outlier factor.The proposed outlier detection method can be applied effectively in numerical and mixed multidimensional data using neighborhood combinatorial entropy.In the experimental part of this paper,we give a comparison on outlier detection before and after attribute reduction.In a comparative analysis,we give results of the enhanced outlier detection accuracy by removing the fewer influencing attributes in numerical and mixed multidimensional data. 展开更多
关键词 Neighborhood combinatorial entropy attribute reduction mixed data outlier detection
下载PDF
Outlier Detection and Forecasting for Bridge Health Monitoring Based on Time Series Intervention Analysis
9
作者 Bing Qu Ping Liao Yaolong Huang 《Structural Durability & Health Monitoring》 EI 2022年第4期323-341,共19页
The method of time series analysis,applied by establishing appropriate mathematical models for bridge health monitoring data and making forecasts of structural future behavior,stands out as a novel and viable research... The method of time series analysis,applied by establishing appropriate mathematical models for bridge health monitoring data and making forecasts of structural future behavior,stands out as a novel and viable research direction for bridge state assessment.However,outliers inevitably exist in the monitoring data due to various interventions,which reduce the precision of model fitting and affect the forecasting results.Therefore,the identification of outliers is crucial for the accurate interpretation of the monitoring data.In this study,a time series model combined with outlier information for bridge health monitoring is established using intervention analysis theory,and the forecasting of the structural responses is carried out.There are three techniques that we focus on:(1)the modeling of seasonal autoregressive integrated moving average(SARIMA)model;(2)the methodology for outlier identification and amendment under the circumstances that the occurrence time and type of outliers are known and unknown;(3)forecasting of the model with outlier effects.The method was tested with a case study using monitoring data on a real bridge.The establishment of the original SARIMA model without considering outliers is first discussed,including the stationarity,order determination,parameter estimation and diagnostic checking of the model.Then the time-by-time iterative procedure for outlier detection,which is implemented by appropriate test statistics of the residuals,is performed.The SARIMA-outlier model is subsequently built.Finally,a comparative analysis of the forecasting performance between the original model and SARIMA-outlier model is carried out.The results demonstrate that proper time series models are effective in mining the characteristic law of bridge monitoring data.When the influence of outliers is taken into account,the fitted precision of the model is significantly improved and the accuracy and the reliability of the forecast are strengthened. 展开更多
关键词 Structural health monitoring time series analysis outlier detection bridge state assessment bridge sensor data stress forecasting
下载PDF
Outlier Detection Algorithm Based on Iterative Clustering
10
作者 古平 罗辛 +1 位作者 杨瑞龙 张程 《Journal of Donghua University(English Edition)》 EI CAS 2015年第4期554-558,共5页
A novel approach for outlier detection with iterative clustering( ICOD) in diverse subspaces is proposed. The proposed methodology comprises two phases,iterative clustering and outlier factor computation. During the c... A novel approach for outlier detection with iterative clustering( ICOD) in diverse subspaces is proposed. The proposed methodology comprises two phases,iterative clustering and outlier factor computation. During the clustering phase, multiple clusterings are detected alternatively based on an optimization procedure that incorporates terms for cluster quality and novelty relative to existing solution. Once new clusters are detected,outlier factors can be estimated from a new definition for outliers( cluster based outlier), which provides importance to the local data behavior. Experiment shows that the proposed algorithm can detect outliers which exist in different clusterings effectively even in high dimensional data sets. 展开更多
关键词 CLUSTERING outlier detection dimensional reduction
下载PDF
Sparse Reduced-Rank Regression with Outlier Detection
11
作者 LIANG Bing-jie 《Chinese Quarterly Journal of Mathematics》 2021年第3期275-287,共13页
Based on the multivariate mean-shift regression model,we propose a new sparse reduced-rank regression approach to achieve low-rank sparse estimation and outlier detection simultaneously.A sparse mean-shift matrix is i... Based on the multivariate mean-shift regression model,we propose a new sparse reduced-rank regression approach to achieve low-rank sparse estimation and outlier detection simultaneously.A sparse mean-shift matrix is introduced in the model to indicate outliers.The rank constraint and the group-lasso type penalty for the coefficient matrix encourage the low-rank row sparse structure of coefficient matrix and help to achieve dimension reduction and variable selection.An algorithm is developed for solving our problem.In our simulation and real-data application,our new method shows competitive performance compared to other methods. 展开更多
关键词 Reduced-rank regression SPARSITY outlier detection Group-lasso type penalty
下载PDF
Changepoint Detection with Outliers Based on RWPCA
12
作者 Xin Zhang Sanzhi Shi Yuting Guo 《Journal of Applied Mathematics and Physics》 2024年第7期2634-2651,共18页
Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,... Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset. 展开更多
关键词 RWPCA-RFPOP Double Robust outlier detection Biweight Loss
下载PDF
Power Curve Modeling for Wind Turbine Using Hybrid-driven Outlier Detection Method
13
作者 Qi Yao Yang Hu +3 位作者 Jizhen Liu Tianyang Zhao Xiao Qi Shanxun Sun 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2023年第4期1115-1125,共11页
Wind power curve modeling is essential in the analysis and control of wind turbines(WTs),and data preprocessing is a critical step in accurate curve modeling.As traditional methods do not sufficiently consider WT mode... Wind power curve modeling is essential in the analysis and control of wind turbines(WTs),and data preprocessing is a critical step in accurate curve modeling.As traditional methods do not sufficiently consider WT models,this paper proposes a new data cleaning method for wind power curve modeling.In this method,a model-data hybrid-driven(MDHD)outlier detection method is constructed,and an adaptive update rule for major parameters in the detection algorithm is designed based on the WT model.Simultaneously,because the MDHD outlier detection method considers multiple types of operating data of WTs,anomaly detection results require further analysis.Accordingly,an expert system is developed in which a knowledgebase and an inference engine are designed based on the coupling relationships of different operating data.Finally,abnormal data are eliminated and the power curve modeling is completed.The proposed and traditional methods are compared in numerical cases,and the superiority of the proposed method is demonstrated. 展开更多
关键词 Wind turbine power curve modeling outlier detection DATA-DRIVEN expert system
原文传递
Association discovery and outlier detection of air pollution emissions from industrial enterprises driven by big data
14
作者 Zhen Peng Yunxiao Zhang +1 位作者 Yunchong Wang Tianle Tang 《Data Intelligence》 EI 2023年第2期438-456,共19页
Air pollution is a major issue related to national economy and people's livelihood.At present,the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole,and... Air pollution is a major issue related to national economy and people's livelihood.At present,the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole,and is a lack of attention to enterprise pollutant emissions from the micro level.Limited by the amount and time granularity of data from enterprises,enterprise pollutant emissions are stll understudied.Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei,the data mining of enterprises pollution emissions is carried out in the paper,including the association analysis between different features based on grey association,the association mining between different data based on association rule and the outlier detection based on clustering.The results show that:(1)The industries affecting NOx and SO2 mainly are electric power,heat production and supply industry,metal smelting and processing industries in Beijing-Tianjin-Hebei;(2)These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules;(3)The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters,of which three categories belong to outliers with excessive emissions of total vOCs,PM and NH3 respectively. 展开更多
关键词 Air Pollution Emissions of Enterprises outlier detection based on clustering Association Rule Mining Grey Association Analysis Big data
原文传递
Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data
15
作者 Haitao Zhang Wenhai Ma +1 位作者 Qilong Han Zhiqiang Ma 《国际计算机前沿大会会议论文集》 EI 2023年第1期192-206,共15页
This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based... This paper aims to address the problems of data imbalance,parame-ter adjustment complexity,and low accuracy in high-dimensional data anomaly detection.To address these issues,an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed(SEAOD).First,the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm tofill in the minority class samples and generate a new dataset.Then,an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features.Finally,the data are dimensionally reduced based on the autoencoder,and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection,overcoming the impact of the curse of dimensionality on detection algorithms.The experimental results show that on 15 public datasets,this model outperforms other comparison algorithms.Furthermore,it was validated on industrial air quality datasets and achieved the expected results with practicality. 展开更多
关键词 HIGH-DIMENSIONAL data augmentation attention mechanism outlier detection
原文传递
Outlier Detection over Sliding Windows for Probabilistic Data Streams 被引量:4
16
作者 王斌 杨晓春 +1 位作者 王国仁 于戈 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第3期389-400,共12页
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ... Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach. 展开更多
关键词 outlier detection uncertain data probabilistic data stream sliding window
原文传递
GMDH-Based Outlier Detection Model in Classification Problems 被引量:3
17
作者 XIE Ling JIA Yanlin +2 位作者 XIAO Jin GU Xin HUANG Jing 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2020年第5期1516-1532,共17页
In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data... In many practical classification problems,datasets would have a portion of outliers,which could greatly affect the performance of the constructed models.In order to address this issue,we apply the group method of data handin neural network in outlier detection.This study builds a GMDH-based outlier detectio model.This model first implements feature selection in the training set L using GMDH neural network.Then a new training set L can be obtained by mapping the selected key feature subset.Next,a linear regression model can be constructed in the set L by ordinary least squares estimation.Further,it eliminates a sample from the set L randomly every time,and then rebuilds a linear regression model.Finally,outlier detection is realized by calculating Cook’s distance for each sample.Four different customer classification datasets are used to conduct experiments.Results show that GOD model can effectively eliminate outliers,and compared with the five existing outlier detection models,it generally performs significantly better.This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model. 展开更多
关键词 Classification problem Cook’s distance feature selection GMDH outlier detection
原文传递
A predictive DEA model for outlier detection 被引量:3
18
作者 Mingwen Yang Guohua Wan Eric Zheng 《Journal of Management Analytics》 EI 2014年第1期20-41,共22页
Outlier detection is one of the key issues in any data-driven analytics.In this paper,we propose Bi-super DEA,a super DEA-based method that constructs both efficient and inefficient frontiers for outlier detection.In ... Outlier detection is one of the key issues in any data-driven analytics.In this paper,we propose Bi-super DEA,a super DEA-based method that constructs both efficient and inefficient frontiers for outlier detection.In evaluating its predictive performance,we develop a novel predictive DEA procedure,PDEA,which extends the conventional DEA approaches that have been primarily used for in-sample efficiency estimation,to predict outputs for the out-of-sample.This enables us to compare the predictive performance of our approach against several popular outlier detection methods including the parametric robust regression in statistics and non-parametric k-means in data mining.We conduct comprehensive simulation experiments to examine the relative performance of these outlier detection methods under the influence of five factors:sample size,linearity of production function,normality of noise distribution,homogeneity of data,and levels of random noise contaminating the data generating process(DGP).We find that,somewhat surprisingly,Bi-super CCR consistently outperforms Bi-super BCC in detecting outliers.Under the linearity,normality and homogeneity conditions,the parametric robust regression method works best.However,when the DGP violates these conditions,Bi-super DEA emerges as the better choice due to its distribution-free property.Our results shed light on the conditions that each method excels or fails and provide users with practical guidelines on how to choose appropriate methods to detect outliers. 展开更多
关键词 predictive DEA Bi-super DEA outlier detection SIMULATION
原文传递
An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets 被引量:1
19
作者 王习特 申德荣 +3 位作者 白梅 聂铁铮 寇月 于戈 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第6期1233-1248,共16页
The distance-based outlier is a widely used definition of outlier. A point is distinguished as an outlier on the basis of the distances to its nearest neighbors. In this paper, to solve the problem of outlier computin... The distance-based outlier is a widely used definition of outlier. A point is distinguished as an outlier on the basis of the distances to its nearest neighbors. In this paper, to solve the problem of outlier computing in distributed environments, DBOZ, a distributed algorithm for distance-based outlier detection using Z-curve hierarchical tree (ZH-tree) is proposed. First, we propose a new index, ZH-tree, to effectively manage the data in a distributed environment. ZH-tree has two desirable advantages, including clustering property to help search the neighbors of a point, and hierarchical structure to support space pruning. We also design a bottom-up approach to build ZH-tree in parallel, whose time complexity is linear to the number of dimensions and the size of dataset. Second, DBOZ is proposed to compute outliers in distributed environments. It consists of two stages. 1) To avoid calculating the exact nearest neighbors of all the points, we design a greedy method and a new ZH-tree based k-nearest neighbor searching algorithm (ZHkNN for short) to obtain a threshold LW. 2) We propose a filter-and-refine approach, which first filters out the unpromising points using LW, and then outputs the final outliers through refining the remaining points. At last, the efficiency and the effectiveness of ZH-tree and DBOZ are testified through a series of experiments. 展开更多
关键词 outlier detection MULTI-DIMENSIONAL DISTRIBUTED large dataset
原文传递
Top-k Outlier Detection from Uncertain Data 被引量:1
20
作者 Salman Ahmed Shaikh Hiroyuki Kitagawa 《International Journal of Automation and computing》 EI CSCD 2014年第2期128-142,共15页
Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclu... Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence,this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore,a populated-cells list(PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm.An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms. 展开更多
关键词 Top-k distance-based outlier detection uncertain data Gaussian uncertainty cell-based approach PC-list based approach
原文传递
上一页 1 2 3 下一页 到第
使用帮助 返回顶部