Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,...Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic ...In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic mean, respectively, are proposed. It is shown that these new estimators dominate the unbiased estimator under the squared error loss function. Finally, some simulation results to compare the performance of the proposed estimators with that of the unbiased estimator are reported. The simulation results indicate that these new shrinkage estimators provide a substantial improvement in risk under most situations.展开更多
Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature ...Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.展开更多
Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy...Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy and other factors influencing the planting area of rice was constructed,principal component analysis of the system was conducted,and then a mixed linear model where the planting area of rice was as the dependent variable was established.The results show that after the exclusion of the interference from other factors,the minimum purchase price policy for grain had a positive impact on the planting area of rice in Hubei Province.That is,the minimum purchase price policy significantly stimulated the growth of rice planting area in Hubei Province.展开更多
Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, th...Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, that double assumption is unlikely to hold, particularly for the random effects, a crucial component </span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">in </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">which assessment of magnitude is key in such modeling. Alternative fitting methods not relying on that assumption (as ANOVA ones and Rao</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">’</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s MINQUE) apply, quite often, only to the very constrained class of variance components models. In this paper, a new computationally feasible estimation methodology is designed, first for the widely used class of 2-level (or longitudinal) LMMs with only assumption (beyond the usual basic ones) that residual errors are uncorrelated and homoscedastic, with no distributional assumption imposed on the random effects. A major asset of this new approach is that it yields nonnegative variance estimates and covariance matrices estimates which are symmetric and, at least, positive semi-definite. Furthermore, it is shown that when the LMM is, indeed, Gaussian, this new methodology differs from ML just through a slight variation in the denominator of the residual variance estimate. The new methodology actually generalizes to LMMs a well known nonparametric fitting procedure for standard Linear Models. Finally, the methodology is also extended to ANOVA LMMs, generalizing an old method by Henderson for ML estimation in such models under normality.展开更多
A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality...A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality rate. United Nations data shows that the infant mortality rate has a descending trend over the period 1990-2010. This study aims to assess the value of the HDI as a predictor of infant mortality rate. Findings in the paper suggest that significant percentage reductions in infant mortality might be possible for countries for controlling the HDI.展开更多
The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;"...The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> because in example analyses</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> it either generates better LCV scores or more parsimonious models and requires substantially less time.展开更多
Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for mo...Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.展开更多
We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mix...We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mixed models. When correlation exists between the observations in data, the normal Gauss discrepancy in univariate case is not appropriate to measure the distance between the true model and a candidate model. Instead, we define a marginal Gauss discrepancy which takes the correlation into account in the mixed models. The model selection criterion, marginal Cp, called MCp, serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy. An improvement of MCp, called IMCp, is then derived and proved to be a more accurate estimator of the expected marginal Gauss discrepancy than MCp. The performance of the proposed criteria is investigated in a simulation study. The simulation results show that in small samples, the proposed criteria outperform the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information Criterion (BIC) [5] in selecting the correct model;in large samples, their performance is competitive. Further, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.展开更多
The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectivene...The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.展开更多
Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for suc...Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for such territory risk classification.In this work,spatially constrained clustering is first applied to insurance loss data to form rating territories.The generalized linear model(GLM)and generalized linear mixed model(GLMM)are then proposed to derive the risk relativities of obtained clusters.Each basic rating unit within the same cluster,namely Forward Sortation Area(FSA),takes the same risk relativity value as its cluster.The obtained risk relativities from GLM or GLMM are used to calculate the performance metrics,including RMSE,MAD,and Gini coefficients.The spatially constrained clustering and the risk relativity estimate help obtain a set of territory risk benchmarks used in rate filings to guide the rate regulation process.展开更多
Crop root system plays an important role in the water cycle of the soil-plant-atmosphere continuum. In this study, com- bined isotope techniques, root length density and root cell activity analysis were used to invest...Crop root system plays an important role in the water cycle of the soil-plant-atmosphere continuum. In this study, com- bined isotope techniques, root length density and root cell activity analysis were used to investigate the root water uptake mechanisms of winter wheat (Triticum aesfivum L.) under different irrigation depths in the North China Plain. Both direct inference approach and multisource linear mixing model were applied to estimate the distribution of water uptake with depth in six growing stages. Results showed that winter wheat under land surface irrigation treatment (Ts) mainly absorbed water from 10-20 cm soil layers in the wintering and green stages (66.9 and 72.0%, respectively); 0-20 cm (57.0%) in the jointing stage; 0-40 (15.3%) and 80-180 cm (58.1%) in the heading stage; 60-80 (13.2%) and 180-220 cm (35.5%) in the filling stage; and 0-40 (46.8%) and 80-100 cm (31.0%) in the ripening stage. Winter wheat under whole soil layers irrigation treatment (Tw) absorbed more water from deep soil layer than Ts in heading, filling and ripening stages. Moreover, root cell activity and root length density of winter wheat under TW were significantly greater than that of Ts in the three stages. We concluded that distribution of water uptake with depth was affected by the availability of water sources, the root length density and root cell activity. Implementation of the whole soil layers irrigation method can affect root system distribution and thereby increase water use from deeper soil and enhance water use efficiency.展开更多
For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its...For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its density function is shown. The necessary and sufficient conditions under which a quadratic form is distributed as noncentral skew chi-square distribution are obtained. Also, a version of Cochran's theorem is given~ which modifies the result of Wang et al. (2009) and is used to set up exact tests for fixed effects and variance components of the proposed model. For illustration, our main results are applied to a real data problem.展开更多
Eleven evaluating parameters for rice core collection were assessed based on genotypic values and molecular marke' information. Monte Carlo simulation combined with mixed linear model was used to eliminate the interf...Eleven evaluating parameters for rice core collection were assessed based on genotypic values and molecular marke' information. Monte Carlo simulation combined with mixed linear model was used to eliminate the interference from environment in order to draw more reliable results. The coincidence rate of range (CR) was the optimal parameter. Mean Simpson index (MD), mean Shannon-Weaver index of genetic diversity (M1) and mean polymorphism information content (MPIC) were important evaluating parameters. The variable rate of coefficient of variation (VR) could act as an important reference parameter for evaluating the variation degree of core collection. Percentage of polymorphic loci (p) could be used as a determination parameter for the size of core collection. Mean difference percentage (MD) was a determination parameter for the reliability judgment of core collection. The effective evaluating parameters for core collection selected in the research could be used as criteria for sampling percentage in different plant germplasm populations.展开更多
Dissecting the genetic architecture of complex traits is an ongoing challenge for geneticists.Two complementary approaches for genetic mapping,linkage mapping and association mapping have led to successful dissection ...Dissecting the genetic architecture of complex traits is an ongoing challenge for geneticists.Two complementary approaches for genetic mapping,linkage mapping and association mapping have led to successful dissection of complex traits in many crop species.Both of these methods detect quantitative trait loci(QTL) by identifying marker–trait associations,and the only fundamental difference between them is that between mapping populations,which directly determine mapping resolution and power.Based on this difference,we first summarize in this review the advances and limitations of family-based mapping and natural population-based mapping instead of linkage mapping and association mapping.We then describe statistical methods used for improving detection power and computational speed and outline emerging areas such as large-scale meta-analysis for genetic mapping in crops.In the era of next-generation sequencing,there has arisen an urgent need for proper population design,advanced statistical strategies,and precision phenotyping to fully exploit high-throughput genotyping.展开更多
One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model a...One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model approach was employed to unbiasedly predict genotypic values of 20 traits for eliminating the environmental effect. Six commonly used genetic distances(Euclidean,standardized Euclidean,Mahalanobis,city block,cosine and correlation distances) combining four commonly used hierarchical cluster methods(single distance,complete distance,unweighted pair-group average and Ward's methods) were used in the least distance stepwise sampling(LDSS) method for constructing different core subsets. The analyses of variance(ANOVA) of different evaluating parameters showed that the validities of cosine and correlation distances were inferior to those of Euclidean,standardized Euclidean,Mahalanobis and city block distances. Standardized Euclidean distance was slightly more effective than Euclidean,Mahalanobis and city block distances. The principal analysis validated standardized Euclidean distance in the course of constructing practical core subsets. The covariance matrix of accessions might be ill-conditioned when Mahalanobis distance was used to calculate genetic distance at low sampling percentages,which led to bias in small-sized core subset construction. The standardized Euclidean distance is recommended in core subset construction with LDSS method.展开更多
In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of varianc...In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of variance estimate under two squared loss functions. Finally, some simulation results to compare the performance of the proposed estimator with that of the analysis of variance estimate are reported. The simulation results indicate that this new estimator provides a substantial improvement in risk under most situations.展开更多
In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that i...In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that is self scale invariant and then does not involve estimating limiting variance of the test statistic to avoid deteri- orating power of test. Furthermore, the bias correction makes the limit to be a process in which every variable is standard chi-squared. This simple structure of the process enables us to construct a Monte Carlo test proce- dure to approximate the null distribution. Thus, it overcomes a problem we encounter when classical empirical likelihood test is used, as it is asymptotically a functional of Gaussian process plus a normal shift function. The complicated covariance function makes it difficult to employ any approximation for the null distribution. The test is omnibus and power study shows that the test can detect local alternatives approaching the null at parametric rate. Simulations are carried out for illustration and for a comparison with existing method.展开更多
Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis...Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis can be proceeded well. In this paper, we consider the Baringhaus-Henze-Epps-Pulley (BHEP) tests, which are based on an empirical characteristic function. Differing from their case, we consider the normality checking for the random effects which are unobservable and the test should be based on their predictors. The test is consistent against global alternatives, and is sensitive to the local alternatives converging to the null at a certain rate arbitrarily close to 1/V~ where n is sample size. ^-hlrthermore, to overcome the problem that the limiting null distribution of the test is not tractable, we suggest a new method: use a conditional Monte Carlo test (CMCT) to approximate the null distribution, and then to simulate p-values. The test is compared with existing methods, the power is examined, and several examples are applied to illustrate the usefulness of our test in the analysis of longitudinal data.展开更多
基金This study was supported by the National Natural Science Foundation of China(42261008,41971034)the Natural Science Foundation of Gansu Province,China(22JR5RA074).
文摘Stable water isotopes are natural tracers quantifying the contribution of moisture recycling to local precipitation,i.e.,the moisture recycling ratio,but various isotope-based models usually lead to different results,which affects the accuracy of local moisture recycling.In this study,a total of 18 stations from four typical areas in China were selected to compare the performance of isotope-based linear and Bayesian mixing models and to determine local moisture recycling ratio.Among the three vapor sources including advection,transpiration,and surface evaporation,the advection vapor usually played a dominant role,and the contribution of surface evaporation was less than that of transpiration.When the abnormal values were ignored,the arithmetic averages of differences between isotope-based linear and the Bayesian mixing models were 0.9%for transpiration,0.2%for surface evaporation,and–1.1%for advection,respectively,and the medians were 0.5%,0.2%,and–0.8%,respectively.The importance of transpiration was slightly less for most cases when the Bayesian mixing model was applied,and the contribution of advection was relatively larger.The Bayesian mixing model was found to perform better in determining an efficient solution since linear model sometimes resulted in negative contribution ratios.Sensitivity test with two isotope scenarios indicated that the Bayesian model had a relatively low sensitivity to the changes in isotope input,and it was important to accurately estimate the isotopes in precipitation vapor.Generally,the Bayesian mixing model should be recommended instead of a linear model.The findings are useful for understanding the performance of isotope-based linear and Bayesian mixing models under various climate backgrounds.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
基金supported by the Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality (0506011200702)National Natural Science Foundation of China+2 种基金Tian Yuan Special Foundation (10926059)Foundation of Zhejiang Educational Committee (Y200803920)Scientific Research Foundation of Hangzhou Dianzi University(KYS025608094)
文摘In this article, the problem of estimating the covariance matrix in general linear mixed models is considered. Two new classes of estimators obtained by shrinking the eigenvalues towards the origin and the arithmetic mean, respectively, are proposed. It is shown that these new estimators dominate the unbiased estimator under the squared error loss function. Finally, some simulation results to compare the performance of the proposed estimators with that of the unbiased estimator are reported. The simulation results indicate that these new shrinkage estimators provide a substantial improvement in risk under most situations.
基金Under the auspices of National Natural Science Foundation of China (No. 50809004)
文摘Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.
基金Supported by the Humanities and Social Sciences Foundation for Young Scholars of Ministry of Education of China(11y3jc630197)
文摘Impacts of the minimum purchase price policy for grain on the planting area of rice in Hubei Province were analyzed based on a mixed linear model.After the indicator system containing the minimum purchase price policy and other factors influencing the planting area of rice was constructed,principal component analysis of the system was conducted,and then a mixed linear model where the planting area of rice was as the dependent variable was established.The results show that after the exclusion of the interference from other factors,the minimum purchase price policy for grain had a positive impact on the planting area of rice in Hubei Province.That is,the minimum purchase price policy significantly stimulated the growth of rice planting area in Hubei Province.
文摘Today, Linear Mixed Models (LMMs) are fitted, mostly, by assuming that random effects and errors have Gaussian distributions, therefore using Maximum Likelihood (ML) or REML estimation. However, for many data sets, that double assumption is unlikely to hold, particularly for the random effects, a crucial component </span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">in </span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">which assessment of magnitude is key in such modeling. Alternative fitting methods not relying on that assumption (as ANOVA ones and Rao</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">’</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s MINQUE) apply, quite often, only to the very constrained class of variance components models. In this paper, a new computationally feasible estimation methodology is designed, first for the widely used class of 2-level (or longitudinal) LMMs with only assumption (beyond the usual basic ones) that residual errors are uncorrelated and homoscedastic, with no distributional assumption imposed on the random effects. A major asset of this new approach is that it yields nonnegative variance estimates and covariance matrices estimates which are symmetric and, at least, positive semi-definite. Furthermore, it is shown that when the LMM is, indeed, Gaussian, this new methodology differs from ML just through a slight variation in the denominator of the residual variance estimate. The new methodology actually generalizes to LMMs a well known nonparametric fitting procedure for standard Linear Models. Finally, the methodology is also extended to ANOVA LMMs, generalizing an old method by Henderson for ML estimation in such models under normality.
文摘A linear mixed model is used to determine the explaining infant mortality rate data of United Nations countries. The HDI (human development index) has a significant negative linear relationship with infant mortality rate. United Nations data shows that the infant mortality rate has a descending trend over the period 1990-2010. This study aims to assess the value of the HDI as a predictor of infant mortality rate. Findings in the paper suggest that significant percentage reductions in infant mortality might be possible for countries for controlling the HDI.
文摘The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non</span><span style="font-family:Verdana;">-</span><span style="font-family:Verdana;">constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> because in example analyses</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;"> it either generates better LCV scores or more parsimonious models and requires substantially less time.
文摘Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.
文摘We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mixed models. When correlation exists between the observations in data, the normal Gauss discrepancy in univariate case is not appropriate to measure the distance between the true model and a candidate model. Instead, we define a marginal Gauss discrepancy which takes the correlation into account in the mixed models. The model selection criterion, marginal Cp, called MCp, serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy. An improvement of MCp, called IMCp, is then derived and proved to be a more accurate estimator of the expected marginal Gauss discrepancy than MCp. The performance of the proposed criteria is investigated in a simulation study. The simulation results show that in small samples, the proposed criteria outperform the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information Criterion (BIC) [5] in selecting the correct model;in large samples, their performance is competitive. Further, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.
文摘The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.
文摘Territory risk analysis has played an important role in the decision-making of auto insurance rate regulation.Due to the optimality of insurance loss data groupings,clustering methods become the natural choice for such territory risk classification.In this work,spatially constrained clustering is first applied to insurance loss data to form rating territories.The generalized linear model(GLM)and generalized linear mixed model(GLMM)are then proposed to derive the risk relativities of obtained clusters.Each basic rating unit within the same cluster,namely Forward Sortation Area(FSA),takes the same risk relativity value as its cluster.The obtained risk relativities from GLM or GLMM are used to calculate the performance metrics,including RMSE,MAD,and Gini coefficients.The spatially constrained clustering and the risk relativity estimate help obtain a set of territory risk benchmarks used in rate filings to guide the rate regulation process.
基金supported by the National Natural Science Foundation of China(50979065,51109154 and 51249002)the Natural Science Foundation of Shanxi Province,China(2012021026-2)+2 种基金the Program for Science and Technology Development of Shanxi Province,China(20110311018-1)the Specialized Research Fund for the Doctoral Program of Higher Education,China(20111402120006,20121402110009)the Program for Graduate Student Education and Innovation of Shanxi Province,China(2015BY27)
文摘Crop root system plays an important role in the water cycle of the soil-plant-atmosphere continuum. In this study, com- bined isotope techniques, root length density and root cell activity analysis were used to investigate the root water uptake mechanisms of winter wheat (Triticum aesfivum L.) under different irrigation depths in the North China Plain. Both direct inference approach and multisource linear mixing model were applied to estimate the distribution of water uptake with depth in six growing stages. Results showed that winter wheat under land surface irrigation treatment (Ts) mainly absorbed water from 10-20 cm soil layers in the wintering and green stages (66.9 and 72.0%, respectively); 0-20 cm (57.0%) in the jointing stage; 0-40 (15.3%) and 80-180 cm (58.1%) in the heading stage; 60-80 (13.2%) and 180-220 cm (35.5%) in the filling stage; and 0-40 (46.8%) and 80-100 cm (31.0%) in the ripening stage. Winter wheat under whole soil layers irrigation treatment (Tw) absorbed more water from deep soil layer than Ts in heading, filling and ripening stages. Moreover, root cell activity and root length density of winter wheat under TW were significantly greater than that of Ts in the three stages. We concluded that distribution of water uptake with depth was affected by the availability of water sources, the root length density and root cell activity. Implementation of the whole soil layers irrigation method can affect root system distribution and thereby increase water use from deeper soil and enhance water use efficiency.
基金supported by National Natural Science Foundation of China(Grant No.11401148)Ministry of Education of China,Humanities and Social Science Projects(Grant Nos.14YJC910005,10YJC790184)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LY14A010030)Zhejiang Provincial Philosophy and Social Science Planning Project of China(Grant No.13NDJC089YB)Houji Scholar Fund of Northwest A and F University,China
文摘For the linear mixed model with skew-normal random effects, this paper gives the density function, moment generating function and independence conditions. The noncentral skew chi-square distribution is defined and its density function is shown. The necessary and sufficient conditions under which a quadratic form is distributed as noncentral skew chi-square distribution are obtained. Also, a version of Cochran's theorem is given~ which modifies the result of Wang et al. (2009) and is used to set up exact tests for fixed effects and variance components of the proposed model. For illustration, our main results are applied to a real data problem.
基金the National Natural Science Foundation of China (Grant No. 30270759) the Science and Technology Department of Zhejiang Province (Grant No. 2005C32001).
文摘Eleven evaluating parameters for rice core collection were assessed based on genotypic values and molecular marke' information. Monte Carlo simulation combined with mixed linear model was used to eliminate the interference from environment in order to draw more reliable results. The coincidence rate of range (CR) was the optimal parameter. Mean Simpson index (MD), mean Shannon-Weaver index of genetic diversity (M1) and mean polymorphism information content (MPIC) were important evaluating parameters. The variable rate of coefficient of variation (VR) could act as an important reference parameter for evaluating the variation degree of core collection. Percentage of polymorphic loci (p) could be used as a determination parameter for the size of core collection. Mean difference percentage (MD) was a determination parameter for the reliability judgment of core collection. The effective evaluating parameters for core collection selected in the research could be used as criteria for sampling percentage in different plant germplasm populations.
基金supported by the Priority Academic Program Development of Jiangsu Higher Education Institutionthe National Natural Science Foundation of China(Nos.91535103,31391632,and 31200943)+4 种基金the National High Technology Research and Development Program of China(No.2014AA10A601-5)the Natural Science Foundation of Jiangsu Province(No.BK2012261)the Natural Science Foundation of Jiangsu Higher Education Institution(No.14KJA210005)the Postgraduate Research and Innovation Project in Jiangsu Province(No.KYLX151368)the Innovative Research Team of University in Jiangsu Province
文摘Dissecting the genetic architecture of complex traits is an ongoing challenge for geneticists.Two complementary approaches for genetic mapping,linkage mapping and association mapping have led to successful dissection of complex traits in many crop species.Both of these methods detect quantitative trait loci(QTL) by identifying marker–trait associations,and the only fundamental difference between them is that between mapping populations,which directly determine mapping resolution and power.Based on this difference,we first summarize in this review the advances and limitations of family-based mapping and natural population-based mapping instead of linkage mapping and association mapping.We then describe statistical methods used for improving detection power and computational speed and outline emerging areas such as large-scale meta-analysis for genetic mapping in crops.In the era of next-generation sequencing,there has arisen an urgent need for proper population design,advanced statistical strategies,and precision phenotyping to fully exploit high-throughput genotyping.
基金Project supported by the National Natural Science Foundation of China (No. 30270759)the Cooperation Project in Science and Technology between China and Poland Governments (No. 32-38)the Scientific Research Foundation for Doctors in Shandong Academy of Agricultural Sciences (No. [2007]20), China
文摘One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model approach was employed to unbiasedly predict genotypic values of 20 traits for eliminating the environmental effect. Six commonly used genetic distances(Euclidean,standardized Euclidean,Mahalanobis,city block,cosine and correlation distances) combining four commonly used hierarchical cluster methods(single distance,complete distance,unweighted pair-group average and Ward's methods) were used in the least distance stepwise sampling(LDSS) method for constructing different core subsets. The analyses of variance(ANOVA) of different evaluating parameters showed that the validities of cosine and correlation distances were inferior to those of Euclidean,standardized Euclidean,Mahalanobis and city block distances. Standardized Euclidean distance was slightly more effective than Euclidean,Mahalanobis and city block distances. The principal analysis validated standardized Euclidean distance in the course of constructing practical core subsets. The covariance matrix of accessions might be ill-conditioned when Mahalanobis distance was used to calculate genetic distance at low sampling percentages,which led to bias in small-sized core subset construction. The standardized Euclidean distance is recommended in core subset construction with LDSS method.
基金This research is supported by National Natural Science Foundation of China, Tian Yuan Special Foundation under Grant No. 10926059 and Zhejiang Provincial Natural Science Foundation of China under Grant No. Y6100053.
文摘In this paper, the problem of estimating the covariance matrix in general linear mixed models is considered. A new class of estimators is proposed. It is shown that this new estimator dominates the analysis of variance estimate under two squared loss functions. Finally, some simulation results to compare the performance of the proposed estimator with that of the analysis of variance estimate are reported. The simulation results indicate that this new estimator provides a substantial improvement in risk under most situations.
基金Supported by the National Natural Science Foundation of China(No.10901109)a grant(HKBU2030/07P)from the Research Grants Council of Hong Kong,Hong Kong,China
文摘In this paper, we propose a bias-corrected empirical likelihood (BCEL) ratio to construct a goodness- of-fit test for generalized linear mixed models. BCEL test maintains the advantage of empirical likelihood that is self scale invariant and then does not involve estimating limiting variance of the test statistic to avoid deteri- orating power of test. Furthermore, the bias correction makes the limit to be a process in which every variable is standard chi-squared. This simple structure of the process enables us to construct a Monte Carlo test proce- dure to approximate the null distribution. Thus, it overcomes a problem we encounter when classical empirical likelihood test is used, as it is asymptotically a functional of Gaussian process plus a normal shift function. The complicated covariance function makes it difficult to employ any approximation for the null distribution. The test is omnibus and power study shows that the test can detect local alternatives approaching the null at parametric rate. Simulations are carried out for illustration and for a comparison with existing method.
基金supported in part by a grant of Research Grants Council of Hong Kong,and National Natural Science Foundation of China (Grant No. 11101157)
文摘Linear mixed models are popularly used to fit continuous longitudinal data, and the random effects are commonly assumed to have normal distribution. However, this assumption needs to be tested so that further analysis can be proceeded well. In this paper, we consider the Baringhaus-Henze-Epps-Pulley (BHEP) tests, which are based on an empirical characteristic function. Differing from their case, we consider the normality checking for the random effects which are unobservable and the test should be based on their predictors. The test is consistent against global alternatives, and is sensitive to the local alternatives converging to the null at a certain rate arbitrarily close to 1/V~ where n is sample size. ^-hlrthermore, to overcome the problem that the limiting null distribution of the test is not tractable, we suggest a new method: use a conditional Monte Carlo test (CMCT) to approximate the null distribution, and then to simulate p-values. The test is compared with existing methods, the power is examined, and several examples are applied to illustrate the usefulness of our test in the analysis of longitudinal data.