期刊文献+

基于随机子空间的扩展隔离林算法 被引量:3

Extended isolation forest algorithm based on random subspace
下载PDF
导出
摘要 针对扩展隔离林(EIF)算法时间开销过大的问题,提出了一种基于随机子空间的扩展隔离林(RS-EIF)算法。首先,在原数据空间确定多个随机子空间;然后,在不同的随机子空间中通过计算每个节点的截距向量与斜率来构建扩展孤立树,并将多棵扩展孤立树集成为子空间扩展隔离林;最后,通过计算数据点在扩展隔离林中的平均遍历深度来确定数据点是否异常。在离群值检测数据库(ODDS)中的9个真实数据集与呈多元分布的7个人工数据集上的实验结果表明,所提RS-EIF算法对局部异常很敏感,相较EIF算法减少了约60%的时间开销;在样本数量较多的ODDS数据集上,该算法识别精度高出孤立森林(iForest)算法、轻型在线异常检测(LODA)算法和基于连接函数的异常检测(COPOD)算法2~12个百分点。RS-EIF算法在样本数量大的数据集中识别效率更高。 Aiming at the problem of excessive time overhead of the Extended Isolation Forest(EIF)algorithm,a new algorithm named Extended Isolation Forest based on Random Subspace(RS-EIF)was proposed.Firstly,multiple random subspaces were determined in the original data space.Then,in each random subspace,the extended isolated tree was constructed by calculating the intercept vector and slope of each node,and multiple extended isolated trees were integrated into a subspace extended isolation forest.Finally,the average traversal depth of data point in the extended isolation forest was calculated to determine whether the data point was abnormal.Experimental results on 9 real datasets in Outliter Detection DataSet(ODDS)and 7 synthetic datasets with multivariate distribution show that,the RS-EIF algorithm is sensitive to local anomalies and reduces the time overhead by about 60%compared with the EIF algorithm;on the ODDS datasets with many samples,its recognition accuracy is 2 percentage points to 12 percentage points higher than those of the isolation Forest(iForest)algorithm,Lightweight On-line Detection of Anomalies(LODA)algorithm and COPula-based Outlier Detection(COPOD)algorithm.The RS-EIF algorithm has the higher recognition efficiency in the dataset with a large number of samples.
作者 谢雨 蒋瑜 龙超奇 XIE Yu;JIANG Yu;LONG Chaoqi(School of Software Engineering,Chengdu University of Information Technology,Chengdu Sichuan 610225,China)
出处 《计算机应用》 CSCD 北大核心 2021年第6期1679-1685,共7页 journal of Computer Applications
关键词 异常检测 随机子空间 扩展隔离林算法 扩展孤立树 平均遍历深度 anomaly detection random subspace Extended Isolation Forest(EIF)algorithm extended isolated tree average traversal depth
  • 相关文献

参考文献7

二级参考文献247

  • 1陈斌,冯爱民,陈松灿,李斌.基于单簇聚类的数据描述[J].计算机学报,2007,30(8):1325-1332. 被引量:18
  • 2GuoG D, Zhang H J. Boosting for Fast Face Recognition. In: Proc of 2nd International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-Time Systems. Vancouver, Canada, 2001, 96- 100.
  • 3Abney S, Schapire R E, Singer Y. Boosting Applied to Tagging and PP Attachment. ln: Proc of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. New Brunswick, NJ, 1999, 38-45.
  • 4Rochery M, Schapire R E, Rahim M, Gupta N. BoosTexter for Text Categorization in Spoken Language Dialogue. In: Autmmtic Speech Recognition and Understanding Workshop. Madonna di Campiglio Trento, Italy, 2001. Available at http://www, cs.princeton, edu/-schapire/publist, html.
  • 5Rochery M, Schapire R, Rahim M, Gupta N, Riceardi G, Bangalore S, Alshawi H, Douglas S. Combining Prior Knowledge and Boosting for Call Class~flcat~on in Spoken Language DiaLogue. In:Proc of International Conference on Aceousties, Speech and Signal. Orlando, Florida. 2002. Available at http://www, cs/princetonedu/-schapire/whatsnew. html.
  • 6Schapire R E, Singer Y. BcosTexter: A Bcosting-Based System for Text Categorization. Machine Learning, 2000, 39(2- 3): 135- 168.
  • 7Schapire R E, Rochery M, Rahim M, Gupta N. Incorporating Prior Knowledge into Boosting. In: Proc of the 19th International Conference on Machine Learning. Sydney, 2002, 538 - 545.
  • 8Schwenk H, Bengio Y. Adal3oosting Neural Networks: Application to On-Line Character Recognition. In: Proc of the International Conference on Artificial Neural Networks ( ICANN' 97 ). Lausanne, Switzerland: Springer-Verlag, 1997, 967-972.
  • 9Schwenk H. Using Boosting to lmprove a Hybrid HMM/'Neural Network Speech Recognizer. In: Proc of the IEEE International Conferenee on Acoustics, Speech, and Signal (ICASSP 99 ). Phoenix, Arizona, 1999, H : 1009 - 1012.
  • 10Myers K, Keams M, Singh S, Walker M A. A Boosting Approach to Topic Spotting on Subdialogues. In: Proc of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 2000, 655 - 662.

共引文献440

同被引文献6

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部