Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,...Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.展开更多
A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberran...A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.展开更多
Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies....Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.展开更多
This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the me...This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.展开更多
Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers ...Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.展开更多
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of ou...A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.展开更多
The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LT...The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.展开更多
The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns seri...The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.展开更多
In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. T...In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.展开更多
The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model ...The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.展开更多
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma...In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.展开更多
We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregre...We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp...We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.展开更多
Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an ef...Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.展开更多
The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed ...The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed computing model has an attractive feature:it can handle massive datasets that cannot be put into the main memory.On the other hand,since many decisions are made automatically by machines in today’s society,algorithm fairness is also an important research area of machine learning.In this paper,we study two fair clustering problems:the centralized fair k-center problem with outliers and the distributed fair k-center problem with outliers.For these two problems,we have designed corresponding constant approximation ratio algorithms.The theoretical proof and analysis of the approximation ratio,and the running space of the algorithm are given.展开更多
The widespread adoption of the Internet of Things (IoT) has transformed various sectors globally, making themmore intelligent and connected. However, this advancement comes with challenges related to the effectiveness...The widespread adoption of the Internet of Things (IoT) has transformed various sectors globally, making themmore intelligent and connected. However, this advancement comes with challenges related to the effectiveness ofIoT devices. These devices, present in offices, homes, industries, and more, need constant monitoring to ensuretheir proper functionality. The success of smart systems relies on their seamless operation and ability to handlefaults. Sensors, crucial components of these systems, gather data and contribute to their functionality. Therefore,sensor faults can compromise the system’s reliability and undermine the trustworthiness of smart environments.To address these concerns, various techniques and algorithms can be employed to enhance the performance ofIoT devices through effective fault detection. This paper conducted a thorough review of the existing literature andconducted a detailed analysis.This analysis effectively links sensor errors with a prominent fault detection techniquecapable of addressing them. This study is innovative because it paves theway for future researchers to explore errorsthat have not yet been tackled by existing fault detection methods. Significant, the paper, also highlights essentialfactors for selecting and adopting fault detection techniques, as well as the characteristics of datasets and theircorresponding recommended techniques. Additionally, the paper presents amethodical overview of fault detectiontechniques employed in smart devices, including themetrics used for evaluation. Furthermore, the paper examinesthe body of academic work related to sensor faults and fault detection techniques within the domain. This reflectsthe growing inclination and scholarly attention of researchers and academicians toward strategies for fault detectionwithin the realm of the Internet of Things.展开更多
Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagn...Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.展开更多
In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic...In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.展开更多
文摘Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.
文摘A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.
基金Supported by the Natural Science Foundation of China(62072388,62276146)the Industry Guidance Project Foundation of Science technology Bureau of Fujian province(2020H0047)+2 种基金the Natural Science Foundation of Science Technology Bureau of Fujian province(2019J01601)the Creation Fund project of Science Technology Bureau of Fujian province(JAT190596)Putian University Research Project(2022034)。
文摘Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.
基金supported in part by the National Natural Science Foundation of China(61703245,61873148,61933007)the China Postdoctoral Science Foundation(2018T110702)+3 种基金the Postdoctoral Special Innovation Foundation of of Shandong Province of China(201701015)the European Union’s Horizon 2020 Research and Innovation Programme(820776(INTEGRADDE))the Royal Society of the UKthe Alexander von Humboldt Foundation of Germany。
文摘This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.
基金Science Council of Taiwan Province under Grant Nos.NSC 96-2628-E-366-004-MY2 and 96-2628-E-132-001-MY2
文摘Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
基金supported by the National Basic Research Program (973) of China (No. 2004CB117306)the Hi-Tech Research and Devel-opment Program (863) of China (No. 2006AA10A102)
文摘A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.
文摘The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.
文摘The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.
文摘In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.
文摘The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.
文摘In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
文摘We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
文摘We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
文摘Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.
基金This work was supported by the National Natural Science Foundation of China(Nos.12131003,11771386,and 11728104)the Beijing Natural Science Foundadtion Project(No.Z200002)+2 种基金the General Research Projects of Beijing Educations Committee in China(No.KM201910005013)the Natural Sciences and Engineering Research Council of Canada(NSERC)(No.06446)the General Program of Science and Technology Development Project of Beijing Municipal Education Commission(No.KM201810005005).
文摘The clustering problem of big data in the era of artificial intelligence has been widely studied.Because of the huge amount of data,distributed algorithms are often used to deal with big data problems.The distributed computing model has an attractive feature:it can handle massive datasets that cannot be put into the main memory.On the other hand,since many decisions are made automatically by machines in today’s society,algorithm fairness is also an important research area of machine learning.In this paper,we study two fair clustering problems:the centralized fair k-center problem with outliers and the distributed fair k-center problem with outliers.For these two problems,we have designed corresponding constant approximation ratio algorithms.The theoretical proof and analysis of the approximation ratio,and the running space of the algorithm are given.
文摘The widespread adoption of the Internet of Things (IoT) has transformed various sectors globally, making themmore intelligent and connected. However, this advancement comes with challenges related to the effectiveness ofIoT devices. These devices, present in offices, homes, industries, and more, need constant monitoring to ensuretheir proper functionality. The success of smart systems relies on their seamless operation and ability to handlefaults. Sensors, crucial components of these systems, gather data and contribute to their functionality. Therefore,sensor faults can compromise the system’s reliability and undermine the trustworthiness of smart environments.To address these concerns, various techniques and algorithms can be employed to enhance the performance ofIoT devices through effective fault detection. This paper conducted a thorough review of the existing literature andconducted a detailed analysis.This analysis effectively links sensor errors with a prominent fault detection techniquecapable of addressing them. This study is innovative because it paves theway for future researchers to explore errorsthat have not yet been tackled by existing fault detection methods. Significant, the paper, also highlights essentialfactors for selecting and adopting fault detection techniques, as well as the characteristics of datasets and theircorresponding recommended techniques. Additionally, the paper presents amethodical overview of fault detectiontechniques employed in smart devices, including themetrics used for evaluation. Furthermore, the paper examinesthe body of academic work related to sensor faults and fault detection techniques within the domain. This reflectsthe growing inclination and scholarly attention of researchers and academicians toward strategies for fault detectionwithin the realm of the Internet of Things.
基金funded by a science and technology project of State Grid Corporation of China“Comparative Analysis of Long-Term Measurement and Prediction of the Ground Synthetic Electric Field of±800 kV DC Transmission Line”(GYW11201907738)Paulo R.F.Rocha acknowledges the support and funding from the European Research Council(ERC)under the European Union’s Horizon 2020 Research and Innovation Program(Grant Agreement No.947897).
文摘Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.
文摘In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.