A powerful investigative tool in biology is to consider not a single mathematical model but a collection of models designed to explore different working hypotheses and select the best model in that collection.In these...A powerful investigative tool in biology is to consider not a single mathematical model but a collection of models designed to explore different working hypotheses and select the best model in that collection.In these lecture notes,the usual workflow of the use of mathematical models to investigate a biological problem is described and the use of a collection of model is motivated.Models depend on parameters that must be estimated using observations;and when a collection of models is considered,the best model has then to be identified based on available observations.Hence,model calibration and selection,which are intrinsically linked,are essential steps of the workflow.Here,some procedures for model calibration and a criterion,the Akaike Information Criterion,of model selection based on experimental data are described.Rough derivation,practical technique of computation and use of this criterion are detailed.展开更多
For semiparametric regression model selection, based on a model selection criterion there is no finite order (or number of parameters) of the nonparametric part to be estimated consistently, but there is a finite orde...For semiparametric regression model selection, based on a model selection criterion there is no finite order (or number of parameters) of the nonparametric part to be estimated consistently, but there is a finite order (or number of predictor variables) of the linear part to be estimated consistently. The models selected by using AIC and AICC are not consistent estimates of linear part of the true model. In this paper, we study the consistency in model selection by investigating the asymptotic properties of AIC* and AICC*, the modified versions of AIC and AICC respectively, which were proposed by a referee of the reference Shi and Tsai. Under some regular conditions, we prove that the parametric models of the semiparametric regression selected with AIC* and AICC* converge to the true model in probability. In addition, in terms of the mean integrated squared error plus a penalty, these two criteria can also provide an asymptotically efficient selection.展开更多
In this paper we study the estimation of the rank of the parameter trix(RPM) in a growth curve model in the framework of model selection. Following AIC criterion we propose a new general criterion and obtain a strongl...In this paper we study the estimation of the rank of the parameter trix(RPM) in a growth curve model in the framework of model selection. Following AIC criterion we propose a new general criterion and obtain a strongly consistent estimate of the RPM. We come to our conclusions under the assumptions of normal population and a general case separately.展开更多
The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches inc...The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.展开更多
函数型主成分分析(Functional Principal Component Analysis,FPCA)是对函数型数据进行降维的常用技术,本文将考虑函数型数据的主成分联合选择问题。首先,本文给出了两函数型变量的主成分联合模型,并通过基函数展开法和极大惩罚似然法...函数型主成分分析(Functional Principal Component Analysis,FPCA)是对函数型数据进行降维的常用技术,本文将考虑函数型数据的主成分联合选择问题。首先,本文给出了两函数型变量的主成分联合模型,并通过基函数展开法和极大惩罚似然法对样本数据进行曲线平滑。在联合模型基础上,本文给出了确定函数型主成分个数的AIC准则,并提出了改进的ECME算法对模型参数进行估计。模拟显示AIC准则对应的主成分个数选择结果准确率更高,考虑两函数型数据之间相关信息的联合选择效果会比对各函数型数据主成分进行独立选择的结果有所提升。最后,本文将所提方法应用于老年人中医宗气数据的分析.展开更多
Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonl...Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonly modeled as functions of explanatory covariates, adding considerable flexibility to mark-recapture models, but also increasing the subjectivity and complexity of the modeling process. Consequently, model selection and the evaluation of covariate structure remain critical aspects of mark-recapture modeling. The difficulties involved in model selection are compounded in Cormack-Jolly-Seber models because they are composed of separate sub-models for survival and recapture probabilities, which are conceptualized independently even though their parameters are not statistically independent. The construction of models as combinations of sub-models, together with multiple potential covariates, can lead to a large model set. Although desirable, estimation of the parameters of all models may not be feasible. Strategies to search a model space and base inference on a subset of all models exist and enjoy widespread use. However, even though the methods used to search a model space can be expected to influence parameter estimation, the assessment of covariate importance, and therefore the ecological interpretation of the modeling results, the performance of these strategies has received limited investigation. We present a new strategy for searching the space of a candidate set of Cormack-Jolly-Seber models and explore its performance relative to existing strategies using computer simulation. The new strategy provides an improved assessment of the importance of covariates and covariate combinations used to model survival and recapture probabilities, while requiring only a modest increase in the number of models on which inference is based in comparison to existing techniques.展开更多
Evaluation of numerical earthquake forecasting models needs to consider two issues of equal importance:the application scenario of the simulation,and the complexity of the model.Criterion of the evaluation-based model...Evaluation of numerical earthquake forecasting models needs to consider two issues of equal importance:the application scenario of the simulation,and the complexity of the model.Criterion of the evaluation-based model selection faces some interesting problems in need of discussion.展开更多
Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Ban...Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.展开更多
Mixture regression is a regression problem with mixed data. Specifically, in the observations, some data are from one model, while others from other models. Only after assuming the quantity of the model is given, EM o...Mixture regression is a regression problem with mixed data. Specifically, in the observations, some data are from one model, while others from other models. Only after assuming the quantity of the model is given, EM or other algorithms can be used to solve this problem. We propose an information criterion for mixture regression model in this paper. Compared to ordinary information citizen by data simulations, results show our citizen has better performance on choosing the correct quantity of models.展开更多
The paper searched for raw data about wild-caught fish, where a sigmoidal growth function described the mass growth significantly better than non-sigmoidal functions. Specifically, von Bertalanffy’s sigmoidal growth ...The paper searched for raw data about wild-caught fish, where a sigmoidal growth function described the mass growth significantly better than non-sigmoidal functions. Specifically, von Bertalanffy’s sigmoidal growth function (metabolic exponent-pair a = 2/3, b = 1) was compared with unbounded linear growth and with bounded exponential growth using the Akaike information criterion. Thereby the maximum likelihood fits were compared, assuming a lognormal distribution of mass (i.e. a higher variance for heavier animals). Starting from 70+ size-at-age data, the paper focused on 15 data coming from large datasets. Of them, six data with 400 - 20,000 data-points were suitable for sigmoidal growth modeling. For these, a custom-made optimization tool identified the best fitting growth function from the general von Bertalanffy-Pütter class of models. This class generalizes the well-known models of Verhulst (logistic growth), Gompertz and von Bertalanffy. Whereas the best-fitting models varied widely, their exponent-pairs displayed a remarkable pattern, as their difference was close to 1/3 (example: von Bertalanffy exponent-pair). This defined a new class of models, for which the paper provided a biological motivation that relates growth to food consumption.展开更多
精确估计多层材料超声回波信号的重数在超声检测上有着要意义。将小波变换方法用于多层材料超声回波参数估计中,根据高斯模型以超声回波信号的小波变换为基础、利用智能人工蜂群算法,估计出多重超声回波信号的各个参数。采用Akaike Info...精确估计多层材料超声回波信号的重数在超声检测上有着要意义。将小波变换方法用于多层材料超声回波参数估计中,根据高斯模型以超声回波信号的小波变换为基础、利用智能人工蜂群算法,估计出多重超声回波信号的各个参数。采用Akaike Information Criterion(AIC)准则,对叠加的两重和三重超声回波信号的重数进行估计。仿真结果表明,本算法可以实现多重超声回波信号重数的有效估计。用实验测试获得的回波对算法的性能进行了验证,结果证明了该算法的可行性和实用性。展开更多
基金SP is supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada(RGOIN-2018-04967).
文摘A powerful investigative tool in biology is to consider not a single mathematical model but a collection of models designed to explore different working hypotheses and select the best model in that collection.In these lecture notes,the usual workflow of the use of mathematical models to investigate a biological problem is described and the use of a collection of model is motivated.Models depend on parameters that must be estimated using observations;and when a collection of models is considered,the best model has then to be identified based on available observations.Hence,model calibration and selection,which are intrinsically linked,are essential steps of the workflow.Here,some procedures for model calibration and a criterion,the Akaike Information Criterion,of model selection based on experimental data are described.Rough derivation,practical technique of computation and use of this criterion are detailed.
基金This research supported in part by Postdoctoral Science Foundation and NSF of China.
文摘For semiparametric regression model selection, based on a model selection criterion there is no finite order (or number of parameters) of the nonparametric part to be estimated consistently, but there is a finite order (or number of predictor variables) of the linear part to be estimated consistently. The models selected by using AIC and AICC are not consistent estimates of linear part of the true model. In this paper, we study the consistency in model selection by investigating the asymptotic properties of AIC* and AICC*, the modified versions of AIC and AICC respectively, which were proposed by a referee of the reference Shi and Tsai. Under some regular conditions, we prove that the parametric models of the semiparametric regression selected with AIC* and AICC* converge to the true model in probability. In addition, in terms of the mean integrated squared error plus a penalty, these two criteria can also provide an asymptotically efficient selection.
基金This research partially supported by National Natural Science Foundation of China(19631040, 19971085),Ph.D. Program Foundation
文摘In this paper we study the estimation of the rank of the parameter trix(RPM) in a growth curve model in the framework of model selection. Following AIC criterion we propose a new general criterion and obtain a strongly consistent estimate of the RPM. We come to our conclusions under the assumptions of normal population and a general case separately.
基金Supported by the High Technology Research and Development Program of China (863 Program,No2006AA100301)
文摘The performance of six statistical approaches,which can be used for selection of the best model to describe the growth of individual fish,was analyzed using simulated and real length-at-age data.The six approaches include coefficient of determination(R2),adjusted coefficient of determination(adj.-R2),root mean squared error(RMSE),Akaike's information criterion(AIC),bias correction of AIC(AICc) and Bayesian information criterion(BIC).The simulation data were generated by five growth models with different numbers of parameters.Four sets of real data were taken from the literature.The parameters in each of the five growth models were estimated using the maximum likelihood method under the assumption of the additive error structure for the data.The best supported model by the data was identified using each of the six approaches.The results show that R2 and RMSE have the same properties and perform worst.The sample size has an effect on the performance of adj.-R2,AIC,AICc and BIC.Adj.-R2 does better in small samples than in large samples.AIC is not suitable to use in small samples and tends to select more complex model when the sample size becomes large.AICc and BIC have best performance in small and large sample cases,respectively.Use of AICc or BIC is recommended for selection of fish growth model according to the size of the length-at-age data.
文摘函数型主成分分析(Functional Principal Component Analysis,FPCA)是对函数型数据进行降维的常用技术,本文将考虑函数型数据的主成分联合选择问题。首先,本文给出了两函数型变量的主成分联合模型,并通过基函数展开法和极大惩罚似然法对样本数据进行曲线平滑。在联合模型基础上,本文给出了确定函数型主成分个数的AIC准则,并提出了改进的ECME算法对模型参数进行估计。模拟显示AIC准则对应的主成分个数选择结果准确率更高,考虑两函数型数据之间相关信息的联合选择效果会比对各函数型数据主成分进行独立选择的结果有所提升。最后,本文将所提方法应用于老年人中医宗气数据的分析.
文摘Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonly modeled as functions of explanatory covariates, adding considerable flexibility to mark-recapture models, but also increasing the subjectivity and complexity of the modeling process. Consequently, model selection and the evaluation of covariate structure remain critical aspects of mark-recapture modeling. The difficulties involved in model selection are compounded in Cormack-Jolly-Seber models because they are composed of separate sub-models for survival and recapture probabilities, which are conceptualized independently even though their parameters are not statistically independent. The construction of models as combinations of sub-models, together with multiple potential covariates, can lead to a large model set. Although desirable, estimation of the parameters of all models may not be feasible. Strategies to search a model space and base inference on a subset of all models exist and enjoy widespread use. However, even though the methods used to search a model space can be expected to influence parameter estimation, the assessment of covariate importance, and therefore the ecological interpretation of the modeling results, the performance of these strategies has received limited investigation. We present a new strategy for searching the space of a candidate set of Cormack-Jolly-Seber models and explore its performance relative to existing strategies using computer simulation. The new strategy provides an improved assessment of the importance of covariates and covariate combinations used to model survival and recapture probabilities, while requiring only a modest increase in the number of models on which inference is based in comparison to existing techniques.
基金supported by the National natural Science Foundation of China (NSFC, grant No. U2039207)
文摘Evaluation of numerical earthquake forecasting models needs to consider two issues of equal importance:the application scenario of the simulation,and the complexity of the model.Criterion of the evaluation-based model selection faces some interesting problems in need of discussion.
文摘Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.
文摘Mixture regression is a regression problem with mixed data. Specifically, in the observations, some data are from one model, while others from other models. Only after assuming the quantity of the model is given, EM or other algorithms can be used to solve this problem. We propose an information criterion for mixture regression model in this paper. Compared to ordinary information citizen by data simulations, results show our citizen has better performance on choosing the correct quantity of models.
文摘The paper searched for raw data about wild-caught fish, where a sigmoidal growth function described the mass growth significantly better than non-sigmoidal functions. Specifically, von Bertalanffy’s sigmoidal growth function (metabolic exponent-pair a = 2/3, b = 1) was compared with unbounded linear growth and with bounded exponential growth using the Akaike information criterion. Thereby the maximum likelihood fits were compared, assuming a lognormal distribution of mass (i.e. a higher variance for heavier animals). Starting from 70+ size-at-age data, the paper focused on 15 data coming from large datasets. Of them, six data with 400 - 20,000 data-points were suitable for sigmoidal growth modeling. For these, a custom-made optimization tool identified the best fitting growth function from the general von Bertalanffy-Pütter class of models. This class generalizes the well-known models of Verhulst (logistic growth), Gompertz and von Bertalanffy. Whereas the best-fitting models varied widely, their exponent-pairs displayed a remarkable pattern, as their difference was close to 1/3 (example: von Bertalanffy exponent-pair). This defined a new class of models, for which the paper provided a biological motivation that relates growth to food consumption.
文摘精确估计多层材料超声回波信号的重数在超声检测上有着要意义。将小波变换方法用于多层材料超声回波参数估计中,根据高斯模型以超声回波信号的小波变换为基础、利用智能人工蜂群算法,估计出多重超声回波信号的各个参数。采用Akaike Information Criterion(AIC)准则,对叠加的两重和三重超声回波信号的重数进行估计。仿真结果表明,本算法可以实现多重超声回波信号重数的有效估计。用实验测试获得的回波对算法的性能进行了验证,结果证明了该算法的可行性和实用性。