摘要
在数据密集型计算环境中,数据具有海量、高速变化、分布存储和异构等特征,对数据挖掘算法的设计与实现提出了新的挑战.基于MapReduce模型,提出了一种网格技术与基于LOF方法相结合的离群点挖掘算法MR_LOF.Map阶段采用网格进行数据约简,将代表点信息发送给主节点;Reduce阶段使用基于密度的离群点挖掘算法,借助网格期望值E筛选出稠密区域.该算法只需计算稀疏区域对象的LOF值,降低了算法的时间复杂度.实验结果表明,在数据密集型计算环境中,该方法能有效的对离群点进行挖掘.
The characteristics of data, such as huge amounts, high dimension and distributed storage etc, have brought new challenges for the design of outlier mining algorithm in data-inten- sive computing environments. In this paper, outliers mining algorithm MR_LOF based on density combined with grid was put forward on account of MapReduce model. During Map phase, grid was used to simplify data, then representative information was sent to primary node. In Reduce phase, outliers mining algorithm based on density was employed, dense area was selected by the grid^s E. This algorithm was used to only calculate LOF of data in sparse area to reduce time complexity. Experimental results show that this algorithm is effective for mining outliers in data- intensive computing environments.
出处
《山东理工大学学报(自然科学版)》
CAS
2013年第5期32-35,共4页
Journal of Shandong University of Technology:Natural Science Edition
基金
山东省自然科学基金资助项目(ZR2011FL013)
山东省高等学校科技计划项目(J13LN27)