Softmax regression,which is also called multinomial logistic regression,is widely used in various fields for modeling the relationship between covariates and categorical responses with multiple levels.The increasing v...Softmax regression,which is also called multinomial logistic regression,is widely used in various fields for modeling the relationship between covariates and categorical responses with multiple levels.The increasing volumes of data bring new challenges for parameter estimation in softmax regression,and the optimal subsampling method is an effective way to solve them.However,optimal subsampling with replacement requires to access all the sampling probabilities simultaneously to draw a subsample,and the resultant subsample could contain duplicate observations.In this paper,the authors consider Poisson subsampling for its higher estimation accuracy and applicability in the scenario that the data exceed the memory limit.The authors derive the asymptotic properties of the general Poisson subsampling estimator and obtain optimal subsampling probabilities by minimizing the asymptotic variance-covariance matrix under both A-and L-optimality criteria.The optimal subsampling probabilities contain unknown quantities from the full dataset,so the authors suggest an approximately optimal Poisson subsampling algorithm which contains two sampling steps,with the first step as a pilot phase.The authors demonstrate the performance of our optimal Poisson subsampling algorithm through numerical simulations and real data examples.展开更多
The varying-coefficient single-index model(VCSIM)is widely used in economics,statistics and biology.A model averaging method for VCSIM based on a Mallows-type criterion is proposed to improve prodictive capacity,which...The varying-coefficient single-index model(VCSIM)is widely used in economics,statistics and biology.A model averaging method for VCSIM based on a Mallows-type criterion is proposed to improve prodictive capacity,which allows the number of candidate models to diverge with sample size.Under model misspecification,the asymptotic optimality is derived in the sense of achieving the lowest possible squared errors.The authors compare the proposed model averaging method with several other classical model selection methods by simulations and the corresponding results show that the model averaging estimation has a outstanding performance.The authors also apply the method to a real dataset.展开更多
基金Wang Haiying’s research was partially supported by the National Science Foundation under Grant No.CCF 2105571.
文摘Softmax regression,which is also called multinomial logistic regression,is widely used in various fields for modeling the relationship between covariates and categorical responses with multiple levels.The increasing volumes of data bring new challenges for parameter estimation in softmax regression,and the optimal subsampling method is an effective way to solve them.However,optimal subsampling with replacement requires to access all the sampling probabilities simultaneously to draw a subsample,and the resultant subsample could contain duplicate observations.In this paper,the authors consider Poisson subsampling for its higher estimation accuracy and applicability in the scenario that the data exceed the memory limit.The authors derive the asymptotic properties of the general Poisson subsampling estimator and obtain optimal subsampling probabilities by minimizing the asymptotic variance-covariance matrix under both A-and L-optimality criteria.The optimal subsampling probabilities contain unknown quantities from the full dataset,so the authors suggest an approximately optimal Poisson subsampling algorithm which contains two sampling steps,with the first step as a pilot phase.The authors demonstrate the performance of our optimal Poisson subsampling algorithm through numerical simulations and real data examples.
基金supported by the National Nature Science Foundation of Chinaunder Grant Nos.12001559and 11971324+1 种基金the Ministry of Education of Humanities and Social Science projectunder Grant No.19YJC910008。
文摘The varying-coefficient single-index model(VCSIM)is widely used in economics,statistics and biology.A model averaging method for VCSIM based on a Mallows-type criterion is proposed to improve prodictive capacity,which allows the number of candidate models to diverge with sample size.Under model misspecification,the asymptotic optimality is derived in the sense of achieving the lowest possible squared errors.The authors compare the proposed model averaging method with several other classical model selection methods by simulations and the corresponding results show that the model averaging estimation has a outstanding performance.The authors also apply the method to a real dataset.