This paper addresses the problem of inference for a multinomial regression model in the presence of likelihood monotonicity. This paper proposes translating the multinomial regression problem into a conditional logist...This paper addresses the problem of inference for a multinomial regression model in the presence of likelihood monotonicity. This paper proposes translating the multinomial regression problem into a conditional logistic regression problem, using existing techniques to reduce this conditional logistic regression problem to one with fewer observations and fewer covariates, such that probabilities for the canonical sufficient statistic of interest, conditional on remaining sufficient statistics, are identical, and translating this conditional logistic regression problem back to the multinomial regression setting. This reduced multinomial regression problem does not exhibit monotonicity of its likelihood, and so conventional asymptotic techniques can be used.展开更多
Many black box functions and datasets have regions of different variability. Models such as the Gaussian process may fall short in giving the best representation of these complex functions. One successful approach for...Many black box functions and datasets have regions of different variability. Models such as the Gaussian process may fall short in giving the best representation of these complex functions. One successful approach for modeling this type of nonstationarity is the Treed Gaussian process <span style="font-family:Verdana;">[1]</span><span></span><span><span></span></span><span style="font-family:Verdana;">, which extended the Gaussian process by dividing the input space into different regions using a binary tree algorithm. Each region became its own Gaussian process. This iterative inference process formed many different trees and thus, many different Gaussian processes. In the end these were combined to get a posterior predictive distribution at each point. The idea was that when the iterations were combined, smoothing would take place for the surface of the predicted points near tree boundaries. We introduce the Improved Treed Gaussian process, which divides the input space into a single main binary tree where the different tree regions have different variability. The parameters for the Gaussian process for each tree region are then determined. These parameters are then smoothed at the region boundaries. This smoothing leads to a set of parameters for each point in the input space that specify the covariance matrix used to predict the point. The advantage is that the prediction and actual errors are estimated better since the standard deviation and range parameters of each point are related to the variation of the region it is in. Further, smoothing between regions is better since each point prediction uses its parameters over the whole input space. Examples are given in this paper which show these advantages for lower-dimensional problems.</span>展开更多
Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that we...Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that were known a priori. Model averaging approaches, on the other hand, try to combine estimators for a set of candidate models. Specifically, instead of deciding which model is the 'right' one, a model averaging approach suggests to fit a set of candidate models and average over the estimators using data adaptive weights.In this paper we establish a general frequentist model averaging framework that does not set any restrictions on the set of candidate models. It broaden, the scope of the existing methodologies under the frequentist model averaging development. Assuming the data is from an unknown model, we derive the model averaging estimator and study its limiting distributions and related predictions while taking possible modeling biases into account.We propose a set of optimal weights to combine the individual estimators so that the expected mean squared error of the average estimator is minimized. Simulation studies are conducted to compare the performance of the estimator with that of the existing methods. The results show the benefits of the proposed approach over traditional model selection approaches as well as existing model averaging methods.展开更多
Empirical interatomic potentials require optimization of force field parameters to tune interatomic interactions to mimic ones obtained by quantum chemistry-based methods.The optimization of the parameters is complex ...Empirical interatomic potentials require optimization of force field parameters to tune interatomic interactions to mimic ones obtained by quantum chemistry-based methods.The optimization of the parameters is complex and requires the development of new techniques.Here,we propose an INitial-DEsign Enhanced Deep learning-based OPTimization(INDEEDopt)framework to accelerate and improve the quality of the ReaxFF parameterization.The procedure starts with a Latin Hypercube Design(LHD)algorithm that is used to explore the parameter landscape extensively.The LHD passes the information about explored regions to a deep learning model,which finds the minimum discrepancy regions and eliminates unfeasible regions,and constructs a more comprehensive understanding of physically meaningful parameter space.We demonstrate the procedure here for the parameterization of a nickel–chromium binary force field and a tungsten–sulfide–carbon–oxygen–hydrogen quinary force field.We show that INDEEDopt produces improved accuracies in shorter development time compared to the conventional optimization method.展开更多
文摘This paper addresses the problem of inference for a multinomial regression model in the presence of likelihood monotonicity. This paper proposes translating the multinomial regression problem into a conditional logistic regression problem, using existing techniques to reduce this conditional logistic regression problem to one with fewer observations and fewer covariates, such that probabilities for the canonical sufficient statistic of interest, conditional on remaining sufficient statistics, are identical, and translating this conditional logistic regression problem back to the multinomial regression setting. This reduced multinomial regression problem does not exhibit monotonicity of its likelihood, and so conventional asymptotic techniques can be used.
文摘Many black box functions and datasets have regions of different variability. Models such as the Gaussian process may fall short in giving the best representation of these complex functions. One successful approach for modeling this type of nonstationarity is the Treed Gaussian process <span style="font-family:Verdana;">[1]</span><span></span><span><span></span></span><span style="font-family:Verdana;">, which extended the Gaussian process by dividing the input space into different regions using a binary tree algorithm. Each region became its own Gaussian process. This iterative inference process formed many different trees and thus, many different Gaussian processes. In the end these were combined to get a posterior predictive distribution at each point. The idea was that when the iterations were combined, smoothing would take place for the surface of the predicted points near tree boundaries. We introduce the Improved Treed Gaussian process, which divides the input space into a single main binary tree where the different tree regions have different variability. The parameters for the Gaussian process for each tree region are then determined. These parameters are then smoothed at the region boundaries. This smoothing leads to a set of parameters for each point in the input space that specify the covariance matrix used to predict the point. The advantage is that the prediction and actual errors are estimated better since the standard deviation and range parameters of each point are related to the variation of the region it is in. Further, smoothing between regions is better since each point prediction uses its parameters over the whole input space. Examples are given in this paper which show these advantages for lower-dimensional problems.</span>
基金supported by National Science Foundation of USA (Grant Nos.DMS1812048,DMS-1737857,DMS-1513483 and DMS-1418042)National Natural Science Foundation of China (Grant No.11529101)
文摘Model selection strategies have been routinely employed to determine a model for data analysis in statistics, and further study and inference then often proceed as though the selected model were the true model that were known a priori. Model averaging approaches, on the other hand, try to combine estimators for a set of candidate models. Specifically, instead of deciding which model is the 'right' one, a model averaging approach suggests to fit a set of candidate models and average over the estimators using data adaptive weights.In this paper we establish a general frequentist model averaging framework that does not set any restrictions on the set of candidate models. It broaden, the scope of the existing methodologies under the frequentist model averaging development. Assuming the data is from an unknown model, we derive the model averaging estimator and study its limiting distributions and related predictions while taking possible modeling biases into account.We propose a set of optimal weights to combine the individual estimators so that the expected mean squared error of the average estimator is minimized. Simulation studies are conducted to compare the performance of the estimator with that of the existing methods. The results show the benefits of the proposed approach over traditional model selection approaches as well as existing model averaging methods.
基金The authors acknowledge partial funding support from U.S.National Science Foundation under Award No.DMR-1842922,DMR-1842952,DMR-1539916,and MRI-1626251.
文摘Empirical interatomic potentials require optimization of force field parameters to tune interatomic interactions to mimic ones obtained by quantum chemistry-based methods.The optimization of the parameters is complex and requires the development of new techniques.Here,we propose an INitial-DEsign Enhanced Deep learning-based OPTimization(INDEEDopt)framework to accelerate and improve the quality of the ReaxFF parameterization.The procedure starts with a Latin Hypercube Design(LHD)algorithm that is used to explore the parameter landscape extensively.The LHD passes the information about explored regions to a deep learning model,which finds the minimum discrepancy regions and eliminates unfeasible regions,and constructs a more comprehensive understanding of physically meaningful parameter space.We demonstrate the procedure here for the parameterization of a nickel–chromium binary force field and a tungsten–sulfide–carbon–oxygen–hydrogen quinary force field.We show that INDEEDopt produces improved accuracies in shorter development time compared to the conventional optimization method.