Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and lik...Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and likelihood based methods,because of its robustness and high efficiency.To this end,the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method(DC-MR).The major novelty of this method consists of splitting one entire dataset into several blocks,implementing the MR method on data in each block,and deriving final results through combining these regression results via a weighted average,which provides approximate estimates of regression results on the entire dataset.The proposed method significantly reduces the required amount of primary memory,and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set.The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property.In addition,the authors propose a practical modified modal expectation-maximization(MEM)algorithm for the proposed procedures.Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.展开更多
基金supported by the Fundamental Research Funds for the Central Universities under Grant No.JBK1806002the National Natural Science Foundation of China under Grant No.11471264。
文摘Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and likelihood based methods,because of its robustness and high efficiency.To this end,the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method(DC-MR).The major novelty of this method consists of splitting one entire dataset into several blocks,implementing the MR method on data in each block,and deriving final results through combining these regression results via a weighted average,which provides approximate estimates of regression results on the entire dataset.The proposed method significantly reduces the required amount of primary memory,and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set.The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property.In addition,the authors propose a practical modified modal expectation-maximization(MEM)algorithm for the proposed procedures.Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.