摘要
随着数据收集技术在近年来的飞速发展,传统的统计方法都面临着“海量数据”的挑战.分治算法是应对海量数据的最有效方法之一:其基本思想是将整个数据集分成若干份较小的数据,在每份数据上单独拟合统计模型,然后将多个模型的结果进行整合从而得到最终的结果.模型平均是当代统计学和计量经济学研究的国际前沿方法,在经济、金融、生物、医学等方面有着广泛的应用.针对线性模型的MMA和JMA方法,以及广义线性模型的模型平均方法,文章分别提出了它们在海量数据下的分治算法,并通过模拟和实际数据分析来说明算法的有效性和实用性.
With the rapid development of data collection techniques in recent years, traditional statistical methods face the challenge of "massive data". Divide and conquer is one of the most efficient ways to deal with massive data. Its basic idea is to divide the whole data to several subsets, fit a statistical model in each single subset, and combine the results from all the subsets to obtain the final result. Model averaging is a frontier method in statistics and economics. It has wide applications in many areas such as economics, finance, biology and medicine. In this paper, we study the divide and conquer algorithms for Mallows model averaging, Jackknife model averaging and model averaging for generalized linear models. Empirical results are provided to support the proposed algorithms.
作者
方方
尹相菊
张强
FANG Fang;YIN Xiangju;ZHANG Qiang(School of Statistics,East China Normal University,Shanghai 200241)
出处
《系统科学与数学》
CSCD
北大核心
2018年第7期764-776,共13页
Journal of Systems Science and Mathematical Sciences
基金
国家自然科学基金(11601156)
上海市科委科技项目(16QA1401700)资助课题
关键词
分治算法
广义线性模型
JMA
线性模型
MMA
海量数据
Divide and conquer
generalized linear model
jackknife model averaging
linear model
Mallows model averaging
massive data.