考虑模糊时间序列的高维大数据挖掘方法研究被引量：1

Research on High-Dimensional Big Data Mining Method Considering Fuzzy Time Series

下载PDF

导出

摘要高维空间的大数据维数越高,其高维索引结构的性能越差,无法通过数据之间的相似性度量完成挖掘。为此,提出基于模糊时间序列预测的高维大数据挖掘方法。对初始的高维大数据集求取各维度数据的属性信息熵,根据信息熵筛选数据,通过主成分分析备选集合中的数据属性,结合成分协方差与特征值,降低数据维度。采用K均值聚类算法二分聚类处理降维数据,取得粗聚类结果。利用支持向量机的最优超平面与决策树作细化分类。基于时间序列上的数据极值,明确数据集的论域个数与范围,根据模糊化处理的模糊集序数,建立模糊逻辑关系,建立模糊时间序列预测模型,对大数据去模糊化处理,完成高维大数据挖掘。选用UCI大数据库作为样本集设计对比测试实验。实验结果验证了研究方法的大数据挖掘精度更高,数据挖掘加速比高达0.9以上,说明所提方法的实时性较强,具备更好的应用性能。 The higher the dimension of big data in high-dimensional space,the worse the performance of its high-dimensional index structure.So,the mining is unable to be achieved by measuring the similarity between data.Therefore,a method of mining high-dimensional big data based on fuzzy time series prediction was put forward.Firstly,we calculated the attribute information entropy of data in an initial high-dimensional large data set and filtered the data according to the information entropy.After that,we analyzed the data attributes in an alternative set through principal component analysis,and used component covariance and feature values to reduce the data dimension.Secondly,we adopted the bisecting K-means clustering algorithm to process the reduced dimension data,and thus obtained the rough clustering results.Moreover,we used the optimal hyperplane of support vector machine and decision tree for detailed classification.According to the extreme value on the time series,we determined the number and scope of the discussion domain of the data set.Based on the ordinal number of the fuzzified fuzzy set,we constructed a kind of fuzzy logic relationship as well as a model for predicting fuzzy time series.Furthermore,we deblurred big data.Finally,we completed the high-dimensional big data mining.UCI database was selected as the sample set to design a comparative experiment.Experimental results prove that the big data mining accuracy of the proposed method is higher,and its mining acceleration ratio is more than O.9.As a result,the method has strong real-time performance and better application performance.

作者陈婷婷赵世忠 CHEN Ting-ting;ZHAO Shi-zhong(Nanchang University College of Science and Technology,Jiujiang Jiangxi 332020,China;College of Construction Engineering,Nanchang University,Nanchang Jiangxi 330036,China)

机构地区南昌大学科学技术学院南昌大学建设工程学院

出处《计算机仿真》北大核心 2023年第3期467-470,475,共5页 Computer Simulation

基金 2021年度江西省教育厅科学技术研究项目(GJJ217812)。

关键词高维数据挖掘模糊时间序列预测模型主成分分析法聚类算法支持向量机 High dimensional data mining Model of predicting Fuzzy time series Principal component analysis Clustering algorithm Support vector machine

分类号 TP391 [自动化与计算机技术—计算机应用技术]