In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic ...In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic mean, respectively, are proposed. It is shown that these new estimators dominate the unbiased estimator under the squared error loss function. Finally, some simulation results to compare the performance of the proposed estimators with that of the unbiased estimator are reported. The simulation results indicate that these new shrinkage estimators provide a substantial improvement in risk under most situations.展开更多
Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, th...Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, that double assumption is unlikely to hold, particularly for the random effects, a crucial component </span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">in </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">which assessment of magnitude is key in such modeling. Alternative fitting methods not relying on that assumption (as ANOVA ones and Rao</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">’</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s MINQUE) apply, quite often, only to the very constrained class of variance components models. In this paper, a new computationally feasible estimation methodology is designed, first for the widely used class of 2-level (or longitudinal) LMMs with only assumption (beyond the usual basic ones) that residual errors are uncorrelated and homoscedastic, with no distributional assumption imposed on the random effects. A major asset of this new approach is that it yields nonnegative variance estimates and covariance matrices estimates which are symmetric and, at least, positive semi-definite. Furthermore, it is shown that when the LMM is, indeed, Gaussian, this new methodology differs from ML just through a slight variation in the denominator of the residual variance estimate. The new methodology actually generalizes to LMMs a well known nonparametric fitting procedure for standard Linear Models. Finally, the methodology is also extended to ANOVA LMMs, generalizing an old method by Henderson for ML estimation in such models under normality.展开更多
A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality...A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality rate. United Nations data shows that the infant mortality rate has a descending trend over the period 1990-2010. This study aims to assess the value of the HDI as a predictor of infant mortality rate. Findings in the paper suggest that significant percentage reductions in infant mortality might be possible for countries for controlling the HDI.展开更多
Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for suc...Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for such territory risk classification.In this work,spatially constrained clustering is first applied to insurance loss data to form rating territories.The generalized linear model(GLM)and generalized linear mixed model(GLMM)are then proposed to derive the risk relativities of obtained clusters.Each basic rating unit within the same cluster,namely Forward Sortation Area(FSA),takes the same risk relativity value as its cluster.The obtained risk relativities from GLM or GLMM are used to calculate the performance metrics,including RMSE,MAD,and Gini coefficients.The spatially constrained clustering and the risk relativity estimate help obtain a set of territory risk benchmarks used in rate filings to guide the rate regulation process.展开更多
Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,...Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.展开更多
For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its...For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its density function is shown. The necessary and sufficient conditions under which a quadratic form is distributed as noncentral skew chi-square distribution are obtained. Also, a version of Cochran's theorem is given~ which modifies the result of Wang et al. (2009) and is used to set up exact tests for fixed effects and variance components of the proposed model. For illustration, our main results are applied to a real data problem.展开更多
In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of varianc...In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of variance estimate under two squared loss functions. Finally, some simulation results to compare the performance of the proposed estimator with that of the analysis of variance estimate are reported. The simulation results indicate that this new estimator provides a substantial improvement in risk under most situations.展开更多
In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that i...In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that is self scale invariant and then does not involve estimating limiting variance of the test statistic to avoid deteri- orating power of test. Furthermore, the bias correction makes the limit to be a process in which every variable is standard chi-squared. This simple structure of the process enables us to construct a Monte Carlo test proce- dure to approximate the null distribution. Thus, it overcomes a problem we encounter when classical empirical likelihood test is used, as it is asymptotically a functional of Gaussian process plus a normal shift function. The complicated covariance function makes it difficult to employ any approximation for the null distribution. The test is omnibus and power study shows that the test can detect local alternatives approaching the null at parametric rate. Simulations are carried out for illustration and for a comparison with existing method.展开更多
Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis...Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis can be proceeded well. In this paper, we consider the Baringhaus-Henze-Epps-Pulley (BHEP) tests, which are based on an empirical characteristic function. Differing from their case, we consider the normality checking for the random effects which are unobservable and the test should be based on their predictors. The test is consistent against global alternatives, and is sensitive to the local alternatives converging to the null at a certain rate arbitrarily close to 1/V~ where n is sample size. ^-hlrthermore, to overcome the problem that the limiting null distribution of the test is not tractable, we suggest a new method: use a conditional Monte Carlo test (CMCT) to approximate the null distribution, and then to simulate p-values. The test is compared with existing methods, the power is examined, and several examples are applied to illustrate the usefulness of our test in the analysis of longitudinal data.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature ...Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.展开更多
Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy...Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy and other factors influencing the planting area of rice was constructed,principal component analysis of the system was conducted,and then a mixed linear model where the planting area of rice was as the dependent variable was established.The results show that after the exclusion of the interference from other factors,the minimum purchase price policy for grain had a positive impact on the planting area of rice in Hubei Province.That is,the minimum purchase price policy significantly stimulated the growth of rice planting area in Hubei Province.展开更多
We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mix...We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mixed models. When correlation exists between the observations in data, the normal Gauss discrepancy in univariate case is not appropriate to measure the distance between the true model and a candidate model. Instead, we define a marginal Gauss discrepancy which takes the correlation into account in the mixed models. The model selection criterion, marginal Cp, called MCp, serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy. An improvement of MCp, called IMCp, is then derived and proved to be a more accurate estimator of the expected marginal Gauss discrepancy than MCp. The performance of the proposed criteria is investigated in a simulation study. The simulation results show that in small samples, the proposed criteria outperform the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information Criterion (BIC) [5] in selecting the correct model;in large samples, their performance is competitive. Further, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.展开更多
The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;"...The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> because in example analyses</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> it either generates better LCV scores or more parsimonious models and requires substantially less time.展开更多
Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for mo...Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.展开更多
The main purpose of this paper is to investigate D-optimal population designs in multi-response linear mixed models for longitudinal data.Observations of each response variable within subjects are assumed to have a fi...The main purpose of this paper is to investigate D-optimal population designs in multi-response linear mixed models for longitudinal data.Observations of each response variable within subjects are assumed to have a first-order autoregressive structure,possibly with observation error.The equivalence theorems are provided to characterise theD-optimal population designs for the estimation of fixed effects in the model.The semi-Bayesian D-optimal design which is robust against the serial correlation coefficient is also considered.Simulation studies show that the correlation between multi-response variables has tiny effects on the optimal design,while the experimental costs are important factors in the optimal designs.展开更多
The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectivene...The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.展开更多
Background:Animals need to adjust their vigilance strategies when foraging between physically contrasting veg-etated and non-vegetated habitats.Vegetated habitats may pose a greater risk for some if vegetation charact...Background:Animals need to adjust their vigilance strategies when foraging between physically contrasting veg-etated and non-vegetated habitats.Vegetated habitats may pose a greater risk for some if vegetation characteristics function as a visual obstruction but benefit others if they serve as protective shelter.Variation in group size,presence of similar species,along with variation in environmental conditions and anthropogenic disturbance can also influence vigilance investment.Methods:In this study,we quantified the vigilance behaviour of two large-bodied,sympatric migratory curlew species-Far Eastern Curlew(Numenius madagascariensis)and Eurasian Curlew(N.arquata)-in vegetated Suaeda salsa saltmarsh and non-vegetated mudflat habitat in Liaohekou National Nature Reserve,China.We used linear mixed models to examine the effects of habitat type,season,tide time,flock size(conspecific and heterospecific),and human disturbance on curlew vigilance investment.Results:Both species spent a higher percentage of time under visual obstruction in S.salsa habitat compared to mudflat habitat but in response,only Far Eastern Curlew increased their percentage of vigilance time,indicating that visual obstruction in this habitat is only a concern for this species.There was no evidence that S.salsa vegetation served as a form of cryptic background colouration since neither species decreased their vigilance effect in S.salsa habitat in spring compared to the autumn migration season.The effect of curlew social environment(i.e.flock size)was habitat dependent since percentage of vigilance time by curlews in saltmarsh increased with both the number of individual curlews and number of other birds present,but not in mudflat habitat.Conclusions:We conclude that both migratory curlew species exhibit a flexible vigilance adjustment strategy to cope with the different environmental and social conditions of adjacent and sharply contrasting coastal habitats,and that the trade-off between the risks of foraging and the abundance of prey may be a relatively common phenom-enon in these and other shorebird populations.展开更多
The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process ...The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process was presented and was referred to as model-then-calculate with respect to the variable thicknesses of soil horizons(MCV).The model-then-calculate with fixed-thickness(MCF),soil profile statistics(SPS),pedological professional knowledge-based(PKB) and vegetation type-based(Veg) methods were carried out for comparison.With respect to the similar pedological information,nine common layers from topsoil to bedrock were grouped in the MCV.Validation results suggested that the MCV method generated better performance than the other methods considered.For the comparison of polygon based approaches,the Veg method generated better accuracy than both SPS and PKB,as limited soil data were incorporated.Additional prediction of the pedogenetic horizons within MCV benefitted the regional SOCS estimation and provided information for future soil classification and understanding of soil functions.The intermediate product,that is,horizon thickness maps were fluctuant enough and reflected many details in space.The linear mixed model indicated that mean annual air temperature(MAAT) was the most important predictor for the SOCS simulation.The minimal residual of the linear mixed models was achieved in the vegetation type-based model,whereas the maximal residual was fitted in the soil type-based model.About 95% of SOCS could be found in Argosols,Cambosols and Isohumosols.The largest SOCS was found in the croplands with vegetation of Triticum aestivum L.,Sorghum bicolor(L.) Moench,Glycine max(L.) Merr.,Zea mays L.and Setaria italica(L.) P.Beauv.展开更多
In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accu...In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accuracy as noninformative variables will be removed. Having a model with high prediction accuracy allows the researchers to use the model for future forecasting. In this paper, we investigate the differences between various variable selection methods. The aim is to compare the analysis of the frequentist methodology (the backward elimination), penalised shrinkage method (the Adaptive LASSO) and the Least Angle Regression (LARS) for selecting the active variables for data produced by the blocked design experiment. The result of the comparative study supports the utilization of the LARS method for statistical analysis of data from blocked experiments.展开更多
基金supported by the Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality (0506011200702)National Natural Science Foundation of China+2 种基金Tian Yuan Special Foundation (10926059)Foundation of Zhejiang Educational Committee (Y200803920)Scientific Research Foundation of Hangzhou Dianzi University(KYS025608094)
文摘In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic mean, respectively, are proposed. It is shown that these new estimators dominate the unbiased estimator under the squared error loss function. Finally, some simulation results to compare the performance of the proposed estimators with that of the unbiased estimator are reported. The simulation results indicate that these new shrinkage estimators provide a substantial improvement in risk under most situations.
文摘Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, that double assumption is unlikely to hold, particularly for the random effects, a crucial component </span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">in </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">which assessment of magnitude is key in such modeling. Alternative fitting methods not relying on that assumption (as ANOVA ones and Rao</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">’</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s MINQUE) apply, quite often, only to the very constrained class of variance components models. In this paper, a new computationally feasible estimation methodology is designed, first for the widely used class of 2-level (or longitudinal) LMMs with only assumption (beyond the usual basic ones) that residual errors are uncorrelated and homoscedastic, with no distributional assumption imposed on the random effects. A major asset of this new approach is that it yields nonnegative variance estimates and covariance matrices estimates which are symmetric and, at least, positive semi-definite. Furthermore, it is shown that when the LMM is, indeed, Gaussian, this new methodology differs from ML just through a slight variation in the denominator of the residual variance estimate. The new methodology actually generalizes to LMMs a well known nonparametric fitting procedure for standard Linear Models. Finally, the methodology is also extended to ANOVA LMMs, generalizing an old method by Henderson for ML estimation in such models under normality.
文摘A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality rate. United Nations data shows that the infant mortality rate has a descending trend over the period 1990-2010. This study aims to assess the value of the HDI as a predictor of infant mortality rate. Findings in the paper suggest that significant percentage reductions in infant mortality might be possible for countries for controlling the HDI.
文摘Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for such territory risk classification.In this work,spatially constrained clustering is first applied to insurance loss data to form rating territories.The generalized linear model(GLM)and generalized linear mixed model(GLMM)are then proposed to derive the risk relativities of obtained clusters.Each basic rating unit within the same cluster,namely Forward Sortation Area(FSA),takes the same risk relativity value as its cluster.The obtained risk relativities from GLM or GLMM are used to calculate the performance metrics,including RMSE,MAD,and Gini coefficients.The spatially constrained clustering and the risk relativity estimate help obtain a set of territory risk benchmarks used in rate filings to guide the rate regulation process.
基金This study was supported by the National Natural Science Foundation of China(42261008,41971034)the Natural Science Foundation of Gansu Province,China(22JR5RA074).
文摘Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.
基金supported by National Natural Science Foundation of China(Grant No.11401148)Ministry of Education of China,Humanities and Social Science Projects(Grant Nos.14YJC910005,10YJC790184)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LY14A010030)Zhejiang Provincial Philosophy and Social Science Planning Project of China(Grant No.13NDJC089YB)Houji Scholar Fund of Northwest A and F University,China
文摘For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its density function is shown. The necessary and sufficient conditions under which a quadratic form is distributed as noncentral skew chi-square distribution are obtained. Also, a version of Cochran's theorem is given~ which modifies the result of Wang et al. (2009) and is used to set up exact tests for fixed effects and variance components of the proposed model. For illustration, our main results are applied to a real data problem.
基金This research is supported by National Natural Science Foundation of China, Tian Yuan Special Foundation under Grant No. 10926059 and Zhejiang Provincial Natural Science Foundation of China under Grant No. Y6100053.
文摘In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of variance estimate under two squared loss functions. Finally, some simulation results to compare the performance of the proposed estimator with that of the analysis of variance estimate are reported. The simulation results indicate that this new estimator provides a substantial improvement in risk under most situations.
基金Supported by the National Natural Science Foundation of China(No.10901109)a grant(HKBU2030/07P)from the Research Grants Council of Hong Kong,Hong Kong,China
文摘In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that is self scale invariant and then does not involve estimating limiting variance of the test statistic to avoid deteri- orating power of test. Furthermore, the bias correction makes the limit to be a process in which every variable is standard chi-squared. This simple structure of the process enables us to construct a Monte Carlo test proce- dure to approximate the null distribution. Thus, it overcomes a problem we encounter when classical empirical likelihood test is used, as it is asymptotically a functional of Gaussian process plus a normal shift function. The complicated covariance function makes it difficult to employ any approximation for the null distribution. The test is omnibus and power study shows that the test can detect local alternatives approaching the null at parametric rate. Simulations are carried out for illustration and for a comparison with existing method.
基金supported in part by a grant of Research Grants Council of Hong Kong,and National Natural Science Foundation of China (Grant No. 11101157)
文摘Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis can be proceeded well. In this paper, we consider the Baringhaus-Henze-Epps-Pulley (BHEP) tests, which are based on an empirical characteristic function. Differing from their case, we consider the normality checking for the random effects which are unobservable and the test should be based on their predictors. The test is consistent against global alternatives, and is sensitive to the local alternatives converging to the null at a certain rate arbitrarily close to 1/V~ where n is sample size. ^-hlrthermore, to overcome the problem that the limiting null distribution of the test is not tractable, we suggest a new method: use a conditional Monte Carlo test (CMCT) to approximate the null distribution, and then to simulate p-values. The test is compared with existing methods, the power is examined, and several examples are applied to illustrate the usefulness of our test in the analysis of longitudinal data.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
基金Under the auspices of National Natural Science Foundation of China (No. 50809004)
文摘Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.
基金Supported by the Humanities and Social Sciences Foundation for Young Scholars of Ministry of Education of China(11y3jc630197)
文摘Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy and other factors influencing the planting area of rice was constructed,principal component analysis of the system was conducted,and then a mixed linear model where the planting area of rice was as the dependent variable was established.The results show that after the exclusion of the interference from other factors,the minimum purchase price policy for grain had a positive impact on the planting area of rice in Hubei Province.That is,the minimum purchase price policy significantly stimulated the growth of rice planting area in Hubei Province.
文摘We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mixed models. When correlation exists between the observations in data, the normal Gauss discrepancy in univariate case is not appropriate to measure the distance between the true model and a candidate model. Instead, we define a marginal Gauss discrepancy which takes the correlation into account in the mixed models. The model selection criterion, marginal Cp, called MCp, serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy. An improvement of MCp, called IMCp, is then derived and proved to be a more accurate estimator of the expected marginal Gauss discrepancy than MCp. The performance of the proposed criteria is investigated in a simulation study. The simulation results show that in small samples, the proposed criteria outperform the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information Criterion (BIC) [5] in selecting the correct model;in large samples, their performance is competitive. Further, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.
文摘The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> because in example analyses</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> it either generates better LCV scores or more parsimonious models and requires substantially less time.
文摘Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.
基金partly supported by the National Natural Science Foundation of China(Nos.11971318,11871143)Shanghai Rising-Star Program(No.20QA1407500).
文摘The main purpose of this paper is to investigate D-optimal population designs in multi-response linear mixed models for longitudinal data.Observations of each response variable within subjects are assumed to have a first-order autoregressive structure,possibly with observation error.The equivalence theorems are provided to characterise theD-optimal population designs for the estimation of fixed effects in the model.The semi-Bayesian D-optimal design which is robust against the serial correlation coefficient is also considered.Simulation studies show that the correlation between multi-response variables has tiny effects on the optimal design,while the experimental costs are important factors in the optimal designs.
文摘The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.
基金supported by National Key Research and Develop-ment Program of China(No.2017YFC1403500 to JL)National Natural Science Foundation of China(No.31911540468 and 31672316 to DL)+1 种基金non-profit Foundation of Marine Environment and Ecological Conservation of CNOOC(CF-MEEC/TR/2020-20 to ZZ)Natural Science Foundation of Liaoning Province of China(2019-MS-154 to DL).
文摘Background:Animals need to adjust their vigilance strategies when foraging between physically contrasting veg-etated and non-vegetated habitats.Vegetated habitats may pose a greater risk for some if vegetation characteristics function as a visual obstruction but benefit others if they serve as protective shelter.Variation in group size,presence of similar species,along with variation in environmental conditions and anthropogenic disturbance can also influence vigilance investment.Methods:In this study,we quantified the vigilance behaviour of two large-bodied,sympatric migratory curlew species-Far Eastern Curlew(Numenius madagascariensis)and Eurasian Curlew(N.arquata)-in vegetated Suaeda salsa saltmarsh and non-vegetated mudflat habitat in Liaohekou National Nature Reserve,China.We used linear mixed models to examine the effects of habitat type,season,tide time,flock size(conspecific and heterospecific),and human disturbance on curlew vigilance investment.Results:Both species spent a higher percentage of time under visual obstruction in S.salsa habitat compared to mudflat habitat but in response,only Far Eastern Curlew increased their percentage of vigilance time,indicating that visual obstruction in this habitat is only a concern for this species.There was no evidence that S.salsa vegetation served as a form of cryptic background colouration since neither species decreased their vigilance effect in S.salsa habitat in spring compared to the autumn migration season.The effect of curlew social environment(i.e.flock size)was habitat dependent since percentage of vigilance time by curlews in saltmarsh increased with both the number of individual curlews and number of other birds present,but not in mudflat habitat.Conclusions:We conclude that both migratory curlew species exhibit a flexible vigilance adjustment strategy to cope with the different environmental and social conditions of adjacent and sharply contrasting coastal habitats,and that the trade-off between the risks of foraging and the abundance of prey may be a relatively common phenom-enon in these and other shorebird populations.
基金Under the auspices of Basic Project of State Commission of Science Technology of China(No.2008FY110600)National Natural Science Foundation of China(No.91325301,41401237,41571212,41371224)Field Frontier Program of Institute of Soil Science,Chinese Academy of Sciences(No.ISSASIP1624)
文摘The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process was presented and was referred to as model-then-calculate with respect to the variable thicknesses of soil horizons(MCV).The model-then-calculate with fixed-thickness(MCF),soil profile statistics(SPS),pedological professional knowledge-based(PKB) and vegetation type-based(Veg) methods were carried out for comparison.With respect to the similar pedological information,nine common layers from topsoil to bedrock were grouped in the MCV.Validation results suggested that the MCV method generated better performance than the other methods considered.For the comparison of polygon based approaches,the Veg method generated better accuracy than both SPS and PKB,as limited soil data were incorporated.Additional prediction of the pedogenetic horizons within MCV benefitted the regional SOCS estimation and provided information for future soil classification and understanding of soil functions.The intermediate product,that is,horizon thickness maps were fluctuant enough and reflected many details in space.The linear mixed model indicated that mean annual air temperature(MAAT) was the most important predictor for the SOCS simulation.The minimal residual of the linear mixed models was achieved in the vegetation type-based model,whereas the maximal residual was fitted in the soil type-based model.About 95% of SOCS could be found in Argosols,Cambosols and Isohumosols.The largest SOCS was found in the croplands with vegetation of Triticum aestivum L.,Sorghum bicolor(L.) Moench,Glycine max(L.) Merr.,Zea mays L.and Setaria italica(L.) P.Beauv.
文摘In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accuracy as noninformative variables will be removed. Having a model with high prediction accuracy allows the researchers to use the model for future forecasting. In this paper, we investigate the differences between various variable selection methods. The aim is to compare the analysis of the frequentist methodology (the backward elimination), penalised shrinkage method (the Adaptive LASSO) and the Least Angle Regression (LARS) for selecting the active variables for data produced by the blocked design experiment. The result of the comparative study supports the utilization of the LARS method for statistical analysis of data from blocked experiments.