摘要
针对现有的数据流异常检测算法的不足,提出一种基于随机空间树的数据流异常检测算法。采取统计策略对数据流特征范围进行估计,分割得到多棵随机空间树(RS-Tree),形成RS森林(RS-Forest);RS-Forest采用单窗口策略对数据流进行处理,通过打分和模型更新来实现异常检测;针对实例落入的树节点,定义了分段恒定密度,求取密度估计值相对于森林中所有树的平均值,将其作为数据流中每个新来实例的得分,利用相对于森林中所有树的平均得分对每个新来实例进行排序;窗口满后采用对偶式节点剖度技术进行模型更新,利用采集来的节点尺寸信息对下一轮到达窗口的数据进行打分。利用多种基准数据集进行仿真实验,仿真结果表明,RS-Forest算法在大部分数据集下的AUC得分和运行时间性能优于当前其它基准算法。
Aiming at the shortcomings of the existing data stream anomaly detection algorithms, a data stream anomaly detection algorithm based on randomized space trees was proposed. The statistical method was adopted to estimate the characteristic range of the data stream, and the randomized space trees were obtained using the dividing technology, forming the RS-Forest. The single window policy was used to process data streams, and to achieve anomaly detection by scoring and model updating. The piecewise constant density was defined according to the tree node into which an instance fell, and the average value of density es-timation with respect to the average value of all trees in the forest was obtained, and it was used as the score of each new in-stance in the data stream. The average score of each new instance was sorted using the average score of all the trees in the fo-rest. After the window was full, the model was updated using the dual node dissection technology, and the data from the next round to the window were scored through the node size information collected. Using the variety of benchmark data sets, the re-sults show that the performance of RS-Forest algorithm is superior to the other benchmark algorithms in terms of the AUC scores and the run time in the majority of data sets.
出处
《计算机工程与设计》
北大核心
2017年第9期2414-2419,2471,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(6130900)