摘要
对模型精度与稳健性的要求使得异常值检测与稳健估计在模型构建中变得日益重要.本文首先利用基于边际相关系数构造的高维影响度量指标(HIM)与基于距离相关系数构造的高维数据异常值判别方法(HDC)分别对数据中的异常值进行初步检测,将数据集中的点分为正常点与异常点两类,然后在初始正常点集的基础上利用稳健的参数估计方法和残差空间超椭球等高面的概念构造了对初始正常点集中误判点的纠正方法,并对初始异常点集中各点的异常值概率重新进行计算,以进一步纠正误判入异常点集的正常点,最终对异常值检测的准确率进行进一步的提升.通过对两种数据结构下三种不同类型异常数据的模拟,证明了所提方法的有效性,并通过实例进行验证与分析.
The requirements of model accuracy and robustness make the outlier detection and robust estimation become more and more important in the model construction.In this paper,we first use the high-dimensional influential measure(HIM)based on the marginal correlation and the high-dimensional discriminant method based on the distance correlation(HDC)to respectively detect the outliers in the data set.Then the points are divided into two parts:normal points and abnormal points.Based on the initial normal point set,we construct the method of recovery for the points that are misclassified to normal point set,by using a kind of robust coefficient estimation method and the concept of hyper ellipsoid contour in residual space.Thereafter the outlier probability of each point in the abnormal point set are calculated to further recover the normal points that are misspecified in the abnormal point set and thus detect the true outlier value.The accuracy rate of outlier detection has been further improved.The performance of the proposed method is illustrated through simulations of three types of anomaly data under two predictive data structures,as well as three real examples.
作者
宋亚男
赵学靖
SONG Yanan;ZHAO Xuejing(School of Mathematics and Statistics,Lanzhou University,Lanzhou,730000,China;School of Mathematics and Statistics,Xi'an Jiaotong University,Xi'an,710049,China)
出处
《应用概率统计》
CSCD
北大核心
2021年第2期136-154,共19页
Chinese Journal of Applied Probability and Statistics
基金
国家自然科学基金项目(批准号:11971214、81960309)资助.
关键词
距离相关系数
高维影响度量指标
残差向量超椭球等高面
异常值检测
稳健估计
distance correlation
high-dimensional in uential measure
residual space hyper ellipsoid contour
outlier detection
roust estimation