期刊文献+

基于Spark的扩展孤立森林算法并行化改造实验设计

Experimental design of Spark-based parallelization transformation of extended isolated forest algorithm
下载PDF
导出
摘要 针对海量或高维数据进行异常检测实验时,往往检测速度较慢、效率较低。针对此问题,设计了一种基于Spark分布式计算的扩展孤立森林异常检测算法改造实验。实验基于Spark框架,分别在数据抽样、训练、预测等阶段设计并行化改造方法,通过与单核条件下的算法对比,验证了并行化方法在保证准确性的前提下执行效率得到大大提高。此实验对加深学生对大数据分布式并行处理知识的理解,引导其对海量数据挖掘相关技术的学习兴趣具有积极作用。 When performing anomaly detection experiments on massive or high-dimensional data,the detection speed is often slow and the efficiency is low.For this problem,a comprehensive experimental design of distributed machine learning for parallelizing the extended isolated forest anomaly detection algorithm is proposed,relying on the high-performance advantage of Spark distributed computing.Based on the Spark framework,the experiment designs parallel transformation methods in data sampling,training,prediction stages.Compared with the algorithm running on a single core,it is verified that the execution efficiency of parallelization method is greatly improved under the premise of ensuring accuracy.This experiment plays a positive role in deepening students'understanding of distributed parallel processing knowledge of big data and guiding their interest in learning massive data mining related technologies.
作者 应文豪 孙中强 王诗愉 钟珊 龚声蓉 YING Wenhao;SUN Zhongqiang;WANG Shiyu;ZHONG Shan;GONG Shengrong(School of Computer Science and Engineering,Changshu Institute of Technology,Suzhou 215000,China;School of Computer Science and Technology,Soochow University,Suzhou 215000,China)
出处 《实验技术与管理》 CAS 北大核心 2023年第4期75-81,共7页 Experimental Technology and Management
基金 中国高等教育学会“十四五”规划专项课题(21JSYB16) 国家自然科学基金项目(61972059)。
关键词 大数据并行化 异常检测 孤立森林 数据挖掘 实验设计 big data parallelization abnormal detection isolated forest data mining experimental design
  • 相关文献

参考文献2

二级参考文献17

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部