Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,...Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.展开更多
A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberran...A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.展开更多
Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies....Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.展开更多
This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-S...This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.展开更多
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma...In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.展开更多
This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the me...This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.展开更多
Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers ...Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of ou...A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.展开更多
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp...We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.展开更多
We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregre...We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.展开更多
发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的...发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的下列限制的参量的模型:效率严厉地因孤立点百分比的增加和估计的模型参数的数字倒下;孤立点拒绝被结合模型选择和模型评价。真实图象对的实验显示出我们的建议计划的优秀性能。展开更多
In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. T...In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.展开更多
The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LT...The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.展开更多
The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns seri...The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.展开更多
The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model ...The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.展开更多
Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an ef...Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.展开更多
The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The N...The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.展开更多
The flue temperature is one of the important indicators to characterize the combustion state of an ethylene cracker furnace,the outliers of temperature data can lead to the false alarm.Conventional outlier detection a...The flue temperature is one of the important indicators to characterize the combustion state of an ethylene cracker furnace,the outliers of temperature data can lead to the false alarm.Conventional outlier detection algorithms such as the Isolation Forest algorithm and 3-sigma principle cannot detect the outliers accurately.In order to improve the detection accuracy and reduce the computational complexity,an outlier detection algorithm for flue temperature data based on the CLOF(Clipping Local Outlier Factor,CLOF)algorithm is proposed.The algorithm preprocesses the normalized data using the cluster pruning algorithm,and realizes the high accuracy and high efficiency outlier detection in the outliers candidate set.Using the flue temperature data of an ethylene cracking furnace in a petrochemical plant,the main parameters of the CLOF algorithm are selected according to the experimental results,and the outlier detection effect of the Isolation Forest algorithm,the 3-sigma principle,the conventional LOF algorithm and the CLOF algorithm are compared and analyzed.The results show that the appropriate clipping coefficient in the CLOF algorithm can significantly improve the detection efficiency and detection accuracy.Compared with the outlier detection results of the Isolation Forest algorithm and 3-sigma principle,the accuracy of the CLOF detection results is increased,and the amount of data calculation is significantly reduced.展开更多
Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approach...Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches.展开更多
文摘Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.
文摘A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.
基金Supported by the Natural Science Foundation of China(62072388,62276146)the Industry Guidance Project Foundation of Science technology Bureau of Fujian province(2020H0047)+2 种基金the Natural Science Foundation of Science Technology Bureau of Fujian province(2019J01601)the Creation Fund project of Science Technology Bureau of Fujian province(JAT190596)Putian University Research Project(2022034)。
文摘Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.
文摘This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.
文摘In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
基金supported in part by the National Natural Science Foundation of China(61703245,61873148,61933007)the China Postdoctoral Science Foundation(2018T110702)+3 种基金the Postdoctoral Special Innovation Foundation of of Shandong Province of China(201701015)the European Union’s Horizon 2020 Research and Innovation Programme(820776(INTEGRADDE))the Royal Society of the UKthe Alexander von Humboldt Foundation of Germany。
文摘This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.
基金Science Council of Taiwan Province under Grant Nos.NSC 96-2628-E-366-004-MY2 and 96-2628-E-132-001-MY2
文摘Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
基金supported by the National Basic Research Program (973) of China (No. 2004CB117306)the Hi-Tech Research and Devel-opment Program (863) of China (No. 2006AA10A102)
文摘A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.
文摘We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
文摘We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.
基金Supported by National Natural Science Foundation of China (60675020, 60773132), Natural Science Foundation of Shandong Province (Q2007G02), and Opening Task-fund for National Laboratory of Pattern Recognition
文摘发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的下列限制的参量的模型:效率严厉地因孤立点百分比的增加和估计的模型参数的数字倒下;孤立点拒绝被结合模型选择和模型评价。真实图象对的实验显示出我们的建议计划的优秀性能。
文摘In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.
文摘The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.
文摘The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.
文摘The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.
文摘Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.
基金Funded by Shaanxi Natural Science Foundation(2002G07)
文摘The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61973094)the Maoming Natural Science Foundation(Grant No.2020S004)the Guangdong Basic and Applied Basic Research Fund Project(Grant No.2023A1515012341).
文摘The flue temperature is one of the important indicators to characterize the combustion state of an ethylene cracker furnace,the outliers of temperature data can lead to the false alarm.Conventional outlier detection algorithms such as the Isolation Forest algorithm and 3-sigma principle cannot detect the outliers accurately.In order to improve the detection accuracy and reduce the computational complexity,an outlier detection algorithm for flue temperature data based on the CLOF(Clipping Local Outlier Factor,CLOF)algorithm is proposed.The algorithm preprocesses the normalized data using the cluster pruning algorithm,and realizes the high accuracy and high efficiency outlier detection in the outliers candidate set.Using the flue temperature data of an ethylene cracking furnace in a petrochemical plant,the main parameters of the CLOF algorithm are selected according to the experimental results,and the outlier detection effect of the Isolation Forest algorithm,the 3-sigma principle,the conventional LOF algorithm and the CLOF algorithm are compared and analyzed.The results show that the appropriate clipping coefficient in the CLOF algorithm can significantly improve the detection efficiency and detection accuracy.Compared with the outlier detection results of the Isolation Forest algorithm and 3-sigma principle,the accuracy of the CLOF detection results is increased,and the amount of data calculation is significantly reduced.
文摘Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches.