摘要
为了有效分析云计算环境下的海量高维大数据,需要对数据进行特征选择处理,针对云计算大数据的高动态与高维度特征,提出了基于竞争熵加权结合稀疏原理的在线学习特征选择算法。首先在熵加权迭代的过程中,采用了竞争合并方式对熵加权计算进行优化,降低数据处理的维度,提高算法对高维数据的处理能力;然后引入稀疏分数将局部数据对应的特征做标记,同时根据各自的重要程度排序,去除掉大数据源中的冗余数据;最后,将合并熵加权与稀疏原理应用于在线学习算法框架中,进一步提高算法对高维数据流的处理效率。实验结果验证了提出的算法提高了聚类精度,有效提高了云计算环境下海量高维大数据特征选择的准确性。
In order to effectively analyze the massive high-dimensional big data in the cloud computing environment,the data need to be processed by feature selection.Aiming at the high dynamic and high dimensional characteristics of cloud computing big data,an online learning feature selection algorithm based on competitive entropy weighting and sparse principle was proposed.First of all,in the process of entropy weighted iteration,the method of competitive combination was adopted to optimize the entropy weighted calculation and lower the dimensions of data processing.The processing ability of the algorithm to high dimensional data was improved.Then,sparse score was introduced to mark the corresponding features of local data,at the same time,according to their importance,redundant data were removed from large data sources.Finally,the combined entropy weighting and sparse principle were applied to the framework of online learning algorithm.The processing efficiency of the algorithm for high dimensional data streams was further improved.The experimental results show that the proposed algorithm can improve the clustering accuracy,and the accuracy of feature selection of massive high-dimensional big data in cloud computing environment is improved.
作者
胡晶
HU Jing(Fuzhou Institute of Technology,Fuzhou 350050,China)
出处
《计算机仿真》
北大核心
2019年第4期190-193,共4页
Computer Simulation
基金
2016年教育厅科技类课题JAT160619<基于云存储的高校实时推送技术研究>
2017年福建省高等学校学科带头人培养计划国内访问学者项目(闽教师[2017]87号)
关键词
云计算
大数据
熵加权
稀疏原理
在线学习
Cloud computing
Big data
Entropy weighted
Sparse principle
Online learning