摘要
基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCover、Mulcross、Shuttle和Http(KDD Cup 1999)、Smtp(KDD Cup 1999)、KDD CUP 99数据集上验证K-iForest算法的有效性和效率,并与iForest算法、EIF算法、RRCF算法、GIF算法以及HIF算法进行比较。实验结果表明,K-iForest算法的AUC值高出其他算法0.1%~100.2%。
The isolation forest algorithm based on random subsampling does not take into account the relative density between sample points from different regions in the subsampling.Therefore,a kernel based isolation forest algorithm K-iForest is proposed to improve the performance of the iso⁃lation forest algorithm by resampling based on the probability density function.The effectiveness and efficiency of the K-iForest algorithm are vali⁃dated on the Annthyroid,ForestCover,Mulcross,Shuttle in the Outlier Detection Database(ODDS),and Http(KDD Cup 1999),Smtp(KDD Cup 1999),and KDD CUP 99 datasets,and compared it with the iForest algorithm,EIF algorithm,RRCF algorithm,GIF algorithm,and HIF al⁃gorithm.The experimental results show that the AUC value of the K-iForest algorithm is 0.1%to 100.2%higher than other algorithms.
作者
董东
郝琳琳
DONG Dong;Hao LinLin(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China)
出处
《软件导刊》
2024年第11期125-128,共4页
Software Guide
基金
教育部教育考试院“十四五”规划支撑专项(NEEA2021064)
河北师范大学人文社会科学校内科研基金计划项目(S23JX003)
河北省高等学校人文社会科学研究项目教育发展专项(WTZX202421)。
关键词
核函数
离群点检测
隔离森林算法
概率密度
相对密度
kernel function
outlier detection
isolated forest algorithm
probability density
relative density