期刊文献+

基于改进随机森林的海量结构化数据异常辨识算法

Anomaly Identification Algorithm of Massive Structured Data Based on Improved Random Forest
下载PDF
导出
摘要 结构化数据同时具备现海量与复杂的特征,导致其异常辨识难度上升,为此提出基于改进随机森林的海量结构化数据异常辨识算法。凭借互补集合经验模态分解,获得海量结构化数据的本征模态函数,去除噪声点。随机选择特征子集分裂决策树节点,采用AdaBoost算法对随机森林进行加权,完成随机森林改进。将改进随机森林的扩展空间范围定义为异常值范围,结合局部敏感哈希算法度量去除噪声点后的数据异常度,实现海量结构化数据异常辨识。通过实验表明,所提算法的海量结构化数据异常辨识精准度最高达到了95.8%,结构化数据量为400 G时的辨识耗时为2.52 min,说明该算法的海量结构化数据异常辨识精准率高、耗时短,具有较高的应用价值。 Structured data has both quantitative and complex characteristics,which makes it more difficult to identify massive structured data anomalies.Therefore,an anomaly identification algorithm of massive structured data based on improved random forest is proposed.By means of empirical mode decomposition of complementary sets,the intrinsic mode functions of massive structured data can be obtained and noise points can be removed.The decision tree node is split by randomly selecting feature subset,and the random forest is weighted by AdaBoost algorithm to complete the improvement of random forest.The extended spatial range of the improved random forest is defined as the range of outlier,and the local sensitive hash algorithm is used to measure the degree of data anomalies after removing noise points,so as to realize anomaly identification of massive structured data.Through experiments,it has been shown that the proposed algorithm achieves a maximum accuracy of 95.8%for anomaly identification of massive structured data.When the structured data volume is 400 G,the identification time is 2.52 minutes,indicating that the algorithm has high accuracy and short time for anomaly identification of massive structured data,and has high application value.
作者 宋冀峰 SONG Jifeng(Criminal Justice College,China University of Political Science and Law,Beijing 100088,China)
机构地区 中国政法大学
出处 《微型电脑应用》 2023年第11期156-159,共4页 Microcomputer Applications
关键词 改进随机森林 结构化数据 数据异常辨识 本征模态函数 局部敏感哈希算法 improved random forest structured data data anomaly identification eigenmode function locally sensitive Hash algorithm
  • 相关文献

参考文献12

二级参考文献97

共引文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部