Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,...Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.展开更多
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma...In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.展开更多
This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the me...This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.展开更多
Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers ...Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of ou...A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.展开更多
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp...We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.展开更多
We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregre...We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.展开更多
发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的...发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的下列限制的参量的模型:效率严厉地因孤立点百分比的增加和估计的模型参数的数字倒下;孤立点拒绝被结合模型选择和模型评价。真实图象对的实验显示出我们的建议计划的优秀性能。展开更多
In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. T...In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.展开更多
The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LT...The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.展开更多
The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns seri...The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.展开更多
The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model ...The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.展开更多
Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an ef...Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.展开更多
A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberran...A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.展开更多
Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies....Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.展开更多
The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The N...The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.展开更多
Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming t...Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.展开更多
This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-S...This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.展开更多
文摘Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.
文摘In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
基金supported in part by the National Natural Science Foundation of China(61703245,61873148,61933007)the China Postdoctoral Science Foundation(2018T110702)+3 种基金the Postdoctoral Special Innovation Foundation of of Shandong Province of China(201701015)the European Union’s Horizon 2020 Research and Innovation Programme(820776(INTEGRADDE))the Royal Society of the UKthe Alexander von Humboldt Foundation of Germany。
文摘This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.
基金Science Council of Taiwan Province under Grant Nos.NSC 96-2628-E-366-004-MY2 and 96-2628-E-132-001-MY2
文摘Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
基金supported by the National Basic Research Program (973) of China (No. 2004CB117306)the Hi-Tech Research and Devel-opment Program (863) of China (No. 2006AA10A102)
文摘A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.
文摘We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
文摘We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.
基金Supported by National Natural Science Foundation of China (60675020, 60773132), Natural Science Foundation of Shandong Province (Q2007G02), and Opening Task-fund for National Laboratory of Pattern Recognition
文摘发现在二幅图象之间的可靠的相应的点是在计算机视觉的一个基本问题,特别与 L 视觉框架的发展。这篇论文介绍歧管的通讯并且建议一个新奇计划由听说向上的看法拒绝孤立点歧管。建议计划独立于在出版工作要估计并且克服可得到的方法的下列限制的参量的模型:效率严厉地因孤立点百分比的增加和估计的模型参数的数字倒下;孤立点拒绝被结合模型选择和模型评价。真实图象对的实验显示出我们的建议计划的优秀性能。
文摘In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.
文摘The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.
文摘The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.
文摘The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.
文摘Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.
文摘A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.
基金Supported by the Natural Science Foundation of China(62072388,62276146)the Industry Guidance Project Foundation of Science technology Bureau of Fujian province(2020H0047)+2 种基金the Natural Science Foundation of Science Technology Bureau of Fujian province(2019J01601)the Creation Fund project of Science Technology Bureau of Fujian province(JAT190596)Putian University Research Project(2022034)。
文摘Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.
基金Funded by Shaanxi Natural Science Foundation(2002G07)
文摘The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.
基金supported by the major project of the National Social Science Foundation of China“Big Data-driven Semantic Evaluation System of Science and Technology Literature”(Grant No.21&ZD329)。
文摘Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.
文摘This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.