期刊文献+

云计算海量高维大数据特征选择算法研究 被引量:5

Research on Feature Selection Algorithm of Massive High Dimensional Big Data in Cloud Computing
下载PDF
导出
摘要 为了有效分析云计算环境下的海量高维大数据,需要对数据进行特征选择处理,针对云计算大数据的高动态与高维度特征,提出了基于竞争熵加权结合稀疏原理的在线学习特征选择算法。首先在熵加权迭代的过程中,采用了竞争合并方式对熵加权计算进行优化,降低数据处理的维度,提高算法对高维数据的处理能力;然后引入稀疏分数将局部数据对应的特征做标记,同时根据各自的重要程度排序,去除掉大数据源中的冗余数据;最后,将合并熵加权与稀疏原理应用于在线学习算法框架中,进一步提高算法对高维数据流的处理效率。实验结果验证了提出的算法提高了聚类精度,有效提高了云计算环境下海量高维大数据特征选择的准确性。 In order to effectively analyze the massive high-dimensional big data in the cloud computing environment,the data need to be processed by feature selection.Aiming at the high dynamic and high dimensional characteristics of cloud computing big data,an online learning feature selection algorithm based on competitive entropy weighting and sparse principle was proposed.First of all,in the process of entropy weighted iteration,the method of competitive combination was adopted to optimize the entropy weighted calculation and lower the dimensions of data processing.The processing ability of the algorithm to high dimensional data was improved.Then,sparse score was introduced to mark the corresponding features of local data,at the same time,according to their importance,redundant data were removed from large data sources.Finally,the combined entropy weighting and sparse principle were applied to the framework of online learning algorithm.The processing efficiency of the algorithm for high dimensional data streams was further improved.The experimental results show that the proposed algorithm can improve the clustering accuracy,and the accuracy of feature selection of massive high-dimensional big data in cloud computing environment is improved.
作者 胡晶 HU Jing(Fuzhou Institute of Technology,Fuzhou 350050,China)
机构地区 福州理工学院
出处 《计算机仿真》 北大核心 2019年第4期190-193,共4页 Computer Simulation
基金 2016年教育厅科技类课题JAT160619<基于云存储的高校实时推送技术研究> 2017年福建省高等学校学科带头人培养计划国内访问学者项目(闽教师[2017]87号)
关键词 云计算 大数据 熵加权 稀疏原理 在线学习 Cloud computing Big data Entropy weighted Sparse principle Online learning
  • 相关文献

参考文献8

二级参考文献141

  • 1李波,石冰心,沈斌.可用性约束资源预留与分配算法[J].计算机科学,2005,32(2):28-30. 被引量:2
  • 2王生生,刘大有,曹斌,刘杰.一种高维空间数据的子空间聚类算法[J].计算机应用,2005,25(11):2615-2617. 被引量:12
  • 3HanJ,KamberM.数据挖掘:概念与技术[M].范明,盂小峰,译.2版.北京:机械工业出版社,2007.
  • 4H Jin, et al. Data management Services and Transfer Scheme in China Grid[J]. International Journal of Web and Grid Services, 201,3(4) :447-461.
  • 5C G Xie, H Alazemi, N Ghani. Remuting in advance reservation networks[ J ]. Computer Communications, 2012,35 ( 10 ) :411 - 1421.
  • 6R N Calheiros, R Ranjan, R Beloglazov. CloudSim: a toolkit for modeling and simulation of cloud computing environments and eval- uation of resource provisioning algorithms[ J]. Software : Practice and Experience, 2012,41 ( 1 ) :23-50.
  • 7B Rochwerger, et al. The Reservoir Model and Architecture for Open Federated Cloud Computing [ J ]. IBM Journal of Research and Development, 2009,53 ( 4 ) : 1 - 17.
  • 8L Y Chuang, T S Wei, Y C Yhong. Chaotic catfish particle swarm optimization for solving global numerical optimization problems [ J ]. Applied Mathematics and Computation, 2011, 217:6900-6916.
  • 9JAIN A,MURTY M,FLYNN P J.Data clustering:a review [J].ACM Computing Surveys,1999,31(3):264-323.
  • 10LEOPOLD E,KINDERMANN J.Text categorization with support vector machines:how to represent texts in input space? [J].Machine Learning,2002,46(1/2/3):423-444.

共引文献137

同被引文献27

引证文献5

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部