摘要
传统大数据流时间维度特征存在提取率低、数据异常值筛选实时性差的问题,提出高维大数据流时间维度特征提取方法。利用反向k近邻技术筛选实时数据的异常值,结合熵值法与多层增量特征提取方法,完成高维数据的初次提取,确定样本类型,将数据纳入大数据信息流时间性算法,实现时间维度下数据分析及二次提取。仿真结果显示,上述方法在提高大数据特征提取率、增强数据提取能力方面具有明显优势,同时能够显著实时更新数据特征,实用性较强。
When the traditional method is to extract time dimension features of the big data stream, the extraction rate is low and real-time data outlier screening performance is poor. A method to extract the time dimension features of the high-dimensional big data stream was proposed. First, the inverse k-nearest neighbor technology was used to screen outliers of real-time data. By using the entropy method combined with the multi-layer incremental feature extraction method, the initial extraction of high-dimensional data was completed and the sample type was determined. Moreover, the data were incorporated into the timeliness algorithm of big data information flow. Finally, data analysis and secondary extraction in the time dimension were realized. Simulation results show that the proposed method has obvious advantages in improving the extraction rate of big data features and enhancing data extraction ability. Meanwhile, this method can update data features in real-time, so its practicability is strong.
作者
华涛
HUA Tao(Liaocheng University,Shandong Liaocheng 252059,China)
出处
《计算机仿真》
北大核心
2021年第4期356-360,共5页
Computer Simulation
基金
赛尔网络下一代互联网技术创新项目(NGII20170604)。
关键词
高维数据
特征提取
时间维度
大数据时代
提取效率
High-dimensional data
Feature extraction
Time dimension
Big data Era
Extraction efficiency