In the era of big data,divide-and-conquer,parallel,and distributed inference methods have become increasingly popular.How to effectively use the calibration information from each machine in parallel computation has be...In the era of big data,divide-and-conquer,parallel,and distributed inference methods have become increasingly popular.How to effectively use the calibration information from each machine in parallel computation has become a challenging task for statisticians and computer scientists.Many newly developed methods have roots in traditional statistical approaches that make use of calibration information.In this paper,we first review some classical statistical methods for using calibration information,including simple meta-analysis methods,parametric likelihood,empirical likelihood,and the generalized method of moments.We further investigate how these methods incorporate summarized or auxiliary information from previous studies,related studies,or populations.We find that the methods based on summarized data usually have little or nearly no efficiency loss compared with the corresponding methods based on all-individual data.Finally,we review some recently developed big data analysis methods including communication-efficient distributed approaches,renewal estimation,and incremental inference as examples of the latest developments in methods using calibration information.展开更多
We thank Professor Jun Shao for organizing this interesting discussion.We also thank the six discussants formany insightful comments and suggestions.Assembling data from different sources has been becoming a very popu...We thank Professor Jun Shao for organizing this interesting discussion.We also thank the six discussants formany insightful comments and suggestions.Assembling data from different sources has been becoming a very popular topic nowadays.In our review paper,we have mainly discussed many integration methods when internal data and external data share a common distribution,though the external data may not have information for some underlying variables collected in the internal study.展开更多
基金supported by the National Natural Science Foundation of China[grant numbers 71931004,12171157,and 32030063]the 111 Project[grant number B14019]the Development Fund for Shanghai Talents and the Natural Sciences and Engineering Research Council of Canada(grant number RGPIN-2020-04964).
文摘In the era of big data,divide-and-conquer,parallel,and distributed inference methods have become increasingly popular.How to effectively use the calibration information from each machine in parallel computation has become a challenging task for statisticians and computer scientists.Many newly developed methods have roots in traditional statistical approaches that make use of calibration information.In this paper,we first review some classical statistical methods for using calibration information,including simple meta-analysis methods,parametric likelihood,empirical likelihood,and the generalized method of moments.We further investigate how these methods incorporate summarized or auxiliary information from previous studies,related studies,or populations.We find that the methods based on summarized data usually have little or nearly no efficiency loss compared with the corresponding methods based on all-individual data.Finally,we review some recently developed big data analysis methods including communication-efficient distributed approaches,renewal estimation,and incremental inference as examples of the latest developments in methods using calibration information.
文摘We thank Professor Jun Shao for organizing this interesting discussion.We also thank the six discussants formany insightful comments and suggestions.Assembling data from different sources has been becoming a very popular topic nowadays.In our review paper,we have mainly discussed many integration methods when internal data and external data share a common distribution,though the external data may not have information for some underlying variables collected in the internal study.