摘要
累计局部离群因子(cumulative local outlier factor,C_LOF)算法能有效解决数据流中的概念漂移问题和克服离群点检测中的伪装问题,但在处理高维数据时,时间复杂度较高。为有效解决时间复杂度高的问题,提出一种基于投影索引近邻的累计局部离群因子(cumulative local outlier factor based projection indexed nearest neighbor,PINN_C_LOF)算法。使用滑动窗口维护活跃数据点,在新数据到达和旧数据过期时,引入投影索引近邻(projection indexed nearest neighbor,PINN)方法,增量更新窗口中受影响数据点的近邻。实验结果表明,PINN_C_LOF算法在检测高维流数据离群值时,在保持检测精确度的前提下,其时间复杂度较C_LOF算法明显降低。
Cumulative local outlier factor(C_LOF)algorithm can effectively solve the concept drift problem in data stream and overcome the camouflage problem in outlier detection,but it has high time complexity in processing high-dimensional data.To effectively solve the problem of high time complexity,a cumulative local outlier factor based projection indexed nearest neighbor(PINN_C_LOF)algorithm was proposed.A sliding window was used to maintain active data points,and a projection indexed nearest neighbor(PINN)method was introduced to incrementally update the neighbors of affected data points in the window when new data point arrived and old data point expired.Experimental results show that the time complexity of PINN_C_LOF algorithm is significantly lower than that of C_LOF algorithm on the premise of maintaining the detection accuracy.
作者
梁昌好
童英华
冯忠岭
LIANG Chang-hao;TONG Ying-hua;FENG Zhong-ling(School of Computer,Qinghai Normal University,Xining 810008,China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining 810008,China;School of Physics and Electronic Information,Qinghai Normal University,Xining 810008,China)
出处
《计算机工程与设计》
北大核心
2024年第5期1406-1412,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61862055)
河北省物联网监控工程技术研究中心基金项目(3142016020)
青海省物联网重点实验室基金项目(2020-ZJ-Y16)。
关键词
高维流数据
离群值检测
累计局部离群因子
时间复杂度
投影索引近邻
局部离群因子
物联网
high dimensional data stream
outlier detection
cumulative local outlier factor
time complexity
projection indexed nearest neighbor
local outliner factor
Internet of things