摘要
在收集和处理时间序列数据的过程中,难免会产生误差,而在很多现实情形中误差是自相关非独立的.已有的预测理论在分析误差自相关的时序数据时,往往需要知道预测算法所输出假设空间的显式表达,而对于一些假设空间不明确的模型,比如神经网络,尚未有系统的求解方法和理论保障来分析其在非平稳且误差自相关时序数据上的预测能力.本文基于误差截尾的假设,提出了时间序列的预测PAC可学习理论,并给出了数据依赖情形下的泛化误差界.该界限包含一个时序复杂度度量和一个差异度量,前者描述了序列数据的非平稳性,后者可在适当情形下从数据中估计得到.因此,该误差界并不依赖于假设空间的显式表达,具有较强的普适性.根据上述理论,本文提出了一种基于自回归模型的交替优化算法用于预测非平稳的时间序列数据.我们在真实数据集上进行实验,验证了本文提出算法的有效性.
In collecting and processing time series data,there are inevitably errors,and in many real-world cases,the errors are autocorrelated and non-independent.Existing methods often rely on the explicit form of the hypothesis space corresponding to the forecasting algorithm.In contrast,there has been no systematic paradigm and guarantee for some models with ambiguous hypothesis space,such as neural networks,to analyze their predictive ability on non-stationary and error autocorrelated time series data.Based on the assumption that errors are autocorrelated and truncated,this paper proposes the predictable PAC learning theory and correspondingly presents the data-dependent learning bound.The bound contains a measure of sequence complexity and a discrepancy;the former indicates the inherent nonstationarity of the concerned time series,and the latter can be estimated from the data under mild assumptions.According to the theoretical results above,we propose an autoregressive model-based alternating optimization algorithm for forecasting non-stationary time series data.The experiments conducted on several real-world data sets confirm the effectiveness of our proposed algorithm.
作者
张绍群
张钊钰
姜远
周志华
ZHANG Shao-Qun;ZHANG Zhao-Yu;JIANG Yuan;ZHOU Zhi-Hua(National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第11期2279-2289,共11页
Chinese Journal of Computers
基金
国家自然科学基金(62176116,61921006)
科技创新2030项目(2020AAA-0109401)资助.
关键词
机器学习
时间序列分析
自相关误差
预测PAC可学习性
差异估计
交替优化
machine learning
time series analysis
autocorrelated errors
predictable PAC learnability
discrepancy estimation
alternate optimization