摘要
本次研究基于MapReduce模型在并行式环境中提出一种高性能的计算机数据挖掘算法,将模型输入由原始的一个样本转变为一组样本代以减少Map布局数量,节约算法访问Map的时间开销;在此基础上,从特征赋权角度对K-means聚类算法,对差异簇的差异特征进行权重赋值,以降低特征数据噪声。测试结果显示,该算法在MapReduce并行式环境下呈现良好的数据挖掘准确率,并且聚类收敛用时最低,处理计算机大数据挖掘问题具有一定优势。
This study proposes a high performance algorithm in data mining based on MapReduce model in parallel environment.This algorithm transforms the input of the model from an original sample to a group of samples to reduce the number of Map layout and to save the time of accessing Map.Then from the perspective of feature weighting,the K-means clustering algorithm is used to assign the weight of the difference features of the difference clusters so as to reduce the noise of the feature data.The test results show that under the parallel environment for MapReduce the algorithm is so accurate in data mining and the time clustering convergence consumes is so little that the algorithm has certain advantages in dealing with the problems of big data mining.
作者
金先好
JIN Xian-hao(Lu′an Vocational Technical College,Lu′an 237158,Anhui Province,China)
出处
《景德镇学院学报》
2021年第6期114-116,共3页
Journal of JingDeZhen University
基金
安徽省教育厅高等学校省级质量工程项目:(2019jyxm0612,2020jxtd254,2019cxtd032,2020jyxm1844,2019xfzx04)
安徽省教育厅高校自然科学研究项目:(KJ2019A1065,KJ2020A0952)。