摘要
针对隔离森林异常点检测方法计算烦琐、耗时长等不足,提出基于XmR控制图的异常点检测算法.通过计算样本属性的单值均值、移动极差及其均值,绘制X图与mR图的控制界限和中心线,同时在图中绘制样本的单值属性;根据X图中超出界限的点对应的样本序号,与mR图中超出界限的点对应的样本序号加1,取并集,从数据中将其删除,然后将删除异常点后的数据代入CART、随机森林和支持向量机算法中进行实验验证.结果表明该方法与隔离森林方法相比具有更快的速度和更好的精度,为异常点检测提供了一种新的研究思路.
A novel outlier detection algorithm was proposed based on the XmR control chart to address the complicated calculation and its time-consuming method in detecting isolated forest anomalies.By calculating the single-valued mean,its moving range and average of the sample attributes,we can draw the control limits and centerlines of the X and mR charts,and the single-valued attributes of the samples in the chart.According to the points in the X chart that exceeds the limits Sample number,add 1 to the sample number corresponding to the point that exceeds the limit in the mR graph,we take the union and delete it from the data,and then replace them after the deletion of the anomaly point with the CART.We use the random forest and support vector machine algorithm for experimental validations.The results show that this method has a faster speed and better precisions compared with the isolation forest method,which provides a new research idea for outlier detection.
作者
陈丽芳
王荣杰
刘云庆
周旭
CHEN Lifang;WANG Rongjie;LIU Yunqing;ZHOU Xu(College of Science, North China University of Technology, Tangshan 063000,China;Hebei Key Laboratory of Data Science and Application, Tangshan 063000, China)
出处
《中国科学技术大学学报》
CAS
CSCD
北大核心
2020年第8期1110-1115,1186,共7页
JUSTC
基金
河北省自然科学基金(F2014209086)资助.