摘要
科技领域的衍生行业因普遍存在强时间约束的特性而累积了海量的高维时间序列数据,严峻的数据压力导致传统的数据建模预测方法受制于数据规模和属性维度。支撑高质量的服务对大数据智能预测技术提出了更高的要求,如何在数据层面上实现预测性能的提升是现阶段亟待解决的主要问题。针对上述问题,提出了针对多元时序数据的特征再抽象(Feature Re-Abstraction,FRA)算法,首先通过RobustSTL分解算法提取趋势性和季节性特征(Trend and Seasonality Features,TSFs),实现多元数据的特征二阶抽象,以“抽象即特征”替代传统“标签即特征”的提取策略,再通过Pearson相关系数的运算结果评估再抽象技术捕捉的TSFs与目标参数间的相关强度,证实TSF的数据价值。在FRA算法的基础上结合深度学习模型构建基于数据驱动的多元时序预测算法,通过预测效果验证FRA算法的有效性。实验结果表明,引入TSFs作为数据驱动模型的训练向量能够兼具数据降维、降噪及强相关特性地维持,从而避免模型过拟合并缓解模型欠拟合,提高时序预测算法的准确性和鲁棒性。
Derivative industries in the field of science and technology have accumulated a large amount of high-dimensional time series data due to the general existence of strong time constraints.Severe data pressure makes traditional data modeling and prediction methods limited by data scale and attribute dimensions.Services supporting high-quality put forward higher requirements for big data intelligent prediction technology.How to improve the prediction performance at the data level is a main problem that needs to be solved urgently at this stage.Combined with the above problems,a feature re-abstraction(FRA)algorithm for multivariate time series data is proposed.First,the RobustSTL decomposition algorithm is used to extract trend and seasonality features(TSFs),realize the second-order abstraction of features of multivariate data,and replace the traditional extraction strategy of“labels are features”with“abstract is features”.Then,the correlation strength between the TSFs captured by the re-abstract technology and the target parameters is evaluated by the calculation result of the Pearson correlation coefficient,which confirms the data value of the TSF.On the basis of FRA algorithm,combined with deep learning model,a data-driven multivariate time series prediction algorithm is constructed,and the effectiveness of FRA algorithm is verified by the prediction effect.Experimental results show that the introduction of TSFs as the training vector of the data-driven model can maintain the characteristics of data dimensionality reduction,noise reduction and strong correlation,so as to avoid model overfitting and alleviate model underfitting,and improve the accuracy and robustness of time series prediction algorithms.
作者
王昊
周建涛
郝昕毓
王飞宇
WANG Hao;ZHOU Jiantao;HAO Xinyu;WANG Feiyu(College of Computer Science,Inner Mongolia University,Hohhot 010021,China;National&Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian,Hohhot 010021,China;Engineering Research Center of Ecological Big Data,Ministry of Education,Hohhot 010021,China;Inner Mongolia Engineering Laboratory for Cloud Computing and Service Software,Hohhot 010021,China Inner Mongolia Key Laboratory of Social Computing and Data Processing,Hohhot 010021,China;Inner Mongolia Engineering Laboratory for Big Data Analysis Technology,Hohhot 010021,China;Inner Mongolia Key Laboratory of Discipline Inspection and Supervision Big Data,Hohhot 010021,China;Inner Mongolia Big Data Analysis Technology Engineering Laboratory,Hohhot 010021,China)
出处
《计算机科学》
CSCD
北大核心
2023年第S02期650-657,共8页
Computer Science
基金
国家自然科学基金(62162046)
内蒙古科技攻关项目(2021GG0155)
内蒙古自然科学基金重大项目(2019ZD15)
内蒙古自然科学基金(2019GG372)
内蒙古大学/内蒙古自治区研究生科研创新项目(11200-121024)。
关键词
多元时序数据
多元时序预测算法
特征再抽象
趋势性和季节性特征
相关性评估
Multivariate time series data
Multivariate time series forecasting algorithms
Feature re-abstraction(FRA)
Trend and seasonality feature(TSF)
Correlation assessment