期刊文献+

数据密集型计算环境下离群点挖掘算法设计与实现 被引量:1

Design and application of outlier mining algorithm in data-intensive computing environments
下载PDF
导出
摘要 在数据密集型计算环境中,数据具有海量、高速变化、分布存储和异构等特征,对数据挖掘算法的设计与实现提出了新的挑战.基于MapReduce模型,提出了一种网格技术与基于LOF方法相结合的离群点挖掘算法MR_LOF.Map阶段采用网格进行数据约简,将代表点信息发送给主节点;Reduce阶段使用基于密度的离群点挖掘算法,借助网格期望值E筛选出稠密区域.该算法只需计算稀疏区域对象的LOF值,降低了算法的时间复杂度.实验结果表明,在数据密集型计算环境中,该方法能有效的对离群点进行挖掘. The characteristics of data, such as huge amounts, high dimension and distributed storage etc, have brought new challenges for the design of outlier mining algorithm in data-inten- sive computing environments. In this paper, outliers mining algorithm MR_LOF based on density combined with grid was put forward on account of MapReduce model. During Map phase, grid was used to simplify data, then representative information was sent to primary node. In Reduce phase, outliers mining algorithm based on density was employed, dense area was selected by the grid^s E. This algorithm was used to only calculate LOF of data in sparse area to reduce time complexity. Experimental results show that this algorithm is effective for mining outliers in data- intensive computing environments.
出处 《山东理工大学学报(自然科学版)》 CAS 2013年第5期32-35,共4页 Journal of Shandong University of Technology:Natural Science Edition
基金 山东省自然科学基金资助项目(ZR2011FL013) 山东省高等学校科技计划项目(J13LN27)
关键词 数据挖掘 离群点 数据密集型 MAPREDUCE MR_LOF data mining outlier data-intensive MapReduce MR_LOF
  • 相关文献

参考文献8

二级参考文献37

共引文献110

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部