期刊文献+

基于中心对数比变换的分布型符号数据时间序列建模研究

Time Series Modeling of Distribution-valued Data Based on Centered Log-ratio Transformation
下载PDF
导出
摘要 针对分布型符号数据线性运算不封闭的问题,提出了一种新的分布型符号数据时间序列模型。分布型符号数据分析为处理和分析大规模数据提供了一种全新而有效的思想,具有广泛的应用前景。然而由于表征分布型数据单元为概率分布函数,其特殊表达形式和内在约束导致缺乏合适的运算规则,极大地限制了统计技术的灵活运用。为此,采用中心对数比变换方法,先将表征分布型数据单元的概率密度函数等距变换成普通函数,变换后的函数可以采用函数空间中的线性运算和内积运算,在保持样本空间形态不发生改变的前提下,有效克服了概率分布函数内在约束的影响。进一步提出了分布型时间序列数据的模型识别和参数估计方法,并通过仿真实验和实际数据验证了所提模型方法的合理性和有效性。 With the progress of information technology and the development of the digital era,the acquisition of data is greatly facilitated,on the basis,data sets with a large number of observations are emerging in many fields of natural science and social science.Symbolic data analysis is an efficient tool to deal with large-scale data sets.A common type of symbolic data,named distribution-valued data,also known as numerical modal data are studied according to the definitions of symbolic data analysis,in which a probability distribution is characterized and it is particularly suitable for information mining of massive observations,including interval-valued data,histogram-valued data and general distribution-valued data.In recent years,numerous excellent achievements have emerged in the field of distribution-valued data analysis,among which the theoretical research and practical application of statistical analysis methods have received extensive attention from many scholars.However,due to the lack of effective representation methods and reasonable algebraic operations,existing methods are often subject to some constraints,and may lead to certain analytical errors in calculation,which bring many difficulties to statistical modeling.To deal with the problem of non-closed linear operations for distribution-valued data,the centered log-ratio transformation(clr)method is innovatively applied to the representation and modeling process of the distribution-valued data.The clr method can transform the probability density function into a general function,and then the addition,subtraction and multiplication operations in the function space can be used.The rules of calculation in the transformed function space and the sample statistics of the distribution-valued time series are defined,and the rationality of these definitions is explained.Due to the important role of the numerical characteristics of variables in the identification and estimation process of time series models,and in order to extend classic time series models under the Box-Jenkins framework to distribution-valued data,the numerical characteristics of distribution-valued data are first defined by linear operations and inner products of functions.Based on these definitions,Distributional-AR,Distributional-MA and Distributional-ARMA models are proposed for distribution-valued time series and the modeling process is provided including model specification,parameter estimation and model diagnostics.The proposed method is referred to as the clr-DTS method.Furthermore,a synthetic distribution-valued time series data set is constructed to demonstrate the modeling process of the clr-DTS method.Moreover,the effectiveness of parameter estimation of the proposed method is illustrated through Monte Carlo experiments.Finally,apply the proposed clr-DTS method to model and predict the air quality index(AQI)monitoring data in Beijing,and then compare it with two existing methods in modeling and out-of-sample prediction effect.The results show that the proposed method has better model fitting,higher accuracy,and more stable prediction effect.
作者 陈梅玲 俞翰君 CHEN Meiling;YU Hanjun(School of Statistics,Capital University of Economics and Business,Beijing 100070,China)
出处 《统计与信息论坛》 北大核心 2024年第6期3-14,共12页 Journal of Statistics and Information
基金 国家自然科学基金项目“带有时序特征的分布型符号数据的建模理论研究及其应用”(71801162) 北京市自然科学基金项目“基于多特征复杂数据的京津冀地区社会经济与生态环境协同发展研究”(9224032)。
关键词 符号数据 分布型数据 中心对数比变换 时间序列 贝叶斯空间 symbolic data distribution-valued data centered log-ratio transformation time series Bayesian space
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部