期刊文献+

MapReduce框架下PCA算法的并行实现

The Parallel Implementation of PCA Algorithm in MapReduce Framework
下载PDF
导出
摘要 大数据处理项目中,随着采集到的高维数据指数式增长,数据预处理工作已经成为数据分析和知识挖掘的瓶颈。主成分分析PCA是目前使用最广泛的数据维规约算法,特别是对大型稀疏矩阵,处理效果良好,但通常伴随着大规模复杂运算。基于大数据平台Hadoop的MapReduce并行处理框架的PCA并行处理算法,通过映射和规约将复杂运算分配到多个处理器并行处理,算法验证实验结果表明,数据集规模增大,选取适当的分布计算节点数量,并行PCA方法的加速比可提高约30%,时间消耗可降低约21%。 In the project of big data processing project,with the high-dimensional data growing exponentially,the data preprocessing has become a bottleneck in data analysis and knowledge mining.The Principal Component Analysis(PCA)is the most widely used data dimensioning reduction algorithm,especially,it is good at processing the large sparse matrices,but it accompanied by large-scale complex operations.The PCA parallel processing algorithm based on MapReduce parallel processing framework,assign the operations to multiple processors based on mapping and specification.The experimental results of the algorithm show that the larger data set and the appropriate number of distributed computing nodes,the acceleration ratio can be increased by about 30%and the time consumption can be reduced by about 21%.
作者 陈燕 陈亚林 郑军 CHEN Yan;CHEN Ya-lin;Zhen Jun(School of Mathematics and Information Science of Guiyang University,Guiyang,550002,Guizhou China;School of Management Science,Nanjing University of Finance&Economics,Nanjing,210046,Jiangsu China)
出处 《贵阳学院学报(自然科学版)》 2019年第4期92-96,共5页 Journal of Guiyang University:Natural Sciences
基金 2019年度市科技局贵阳学院科技专项资金[项目编号:GYU-KYZ[2019~2020]PT06-02] 教育部青年基金项目:“水资源约束下的涉煤产业政策研究:机理、模型与仿真”[项目编号:18YJCZH016]
关键词 主成分分析PCA 数据预处理 MAPREDUCE 并行处理 the Principal Component Analysis(PCA) Data Preprocessing MapReduce Parallel Processing
  • 相关文献

参考文献6

二级参考文献32

  • 1林翠,王凤平,李晓刚.大气腐蚀研究方法进展[J].中国腐蚀与防护学报,2004,24(4):249-256. 被引量:76
  • 2杨廷俊.矩阵特征值与特征向量的同步求解法[J].甘肃联合大学学报(自然科学版),2006,20(3):20-22. 被引量:4
  • 3Tu Peilei,Proceedings of the 1992 IEEE International Conference on Tools for Artificial Intelligence,1992年
  • 4Hong J R,Internat J Comput Infor-mation Sci,1985年,14卷,6期,421页
  • 5Back T, Schwefel H E Evolution Strategies I: Variants and Their Computational Implementation[M]. [S. l.]: Wiley, 1995.
  • 6Schwefel H P, Back T. Evolution Strategies II: Theoretical Aspects[M]. [S. l.]: Wiley, 1995.
  • 7Mathews J H, Firk K D. Numerical Methods Using MATLAB[M]. 4th ed. Beijing: Publishing House of Electronics Industry, 2005.
  • 8YANG Yi-ming,PEDERSEN J O.A comparative study on feature selection in text categorization[C]//Proc of the 14th International Conference on Machine Learning.1997:412-420.
  • 9CHAKRABARTI S,DOM B,AGRAEAL R,et al.Using taxonomy,discriminants,and signature for navigating in text databases[C]//Proc of the 23rd VLDB Conference.1997:446-455.
  • 10NG H T,GOH W B,LOW K L.Feature selection,perceptron learning,and a usability case study for text categorizaion[C]//Proc of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1997:67-73.

共引文献331

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部