The extended t-process regression model is developed to robustly model functional data with outlier functional curves.This paper applies Bayesian estimation to propose an estimation procedure for the model with indepe...The extended t-process regression model is developed to robustly model functional data with outlier functional curves.This paper applies Bayesian estimation to propose an estimation procedure for the model with independent errors.A Monte Carlo EM method is built to estimate parameters involved in the model.Simulation studies and real examples show the proposed method performs well against outliers.展开更多
Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for mo...Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.展开更多
Building high confidence regression test suites to validate new system versions is a challenging problem. A modelbased approach to build a regression test suite from a given test suite is described. The generated test...Building high confidence regression test suites to validate new system versions is a challenging problem. A modelbased approach to build a regression test suite from a given test suite is described. The generated test suite includes every test that will traverse a change performed to produce the new version, and consists of only such tests to reduce the testing costs. Finite state machines extended with typed variables (EFSMs) are used to model systems and system changes are mapped to EFSM transition changes adding/deleting/replacing EFSM transitions and states. Tests are a sequence of input and expected output messages with concrete parameter values over the supported data types. An invariant is formulated to characterize tests whose runtime behavior can be accurately predicted by analyzing their descriptions along with the model. Incremental procedures to efficiently evaluate the invariant and to select tests for regression are developed. Overlaps among the test descriptions are exploited to extend the approach to simultaneously select multiple tests to reduce the test selection costs. Effectiveness of the approach is demonstrated by applying it to several protocols, Web services, and model programs extracted from a popular testing benchmark. Our experimental results show that the proposed approach is economical for regression test selection in all these examples. For all these examples, the proposed approach is able to identify all tests exercising changes more efficiently than brute-force symbolic evaluation.展开更多
Process regression models,such as Gaussian process regression model(GPR),have been widely applied to analyze kinds of functional data.This paper introduces a composite of two T-process(CT),where the first one captures...Process regression models,such as Gaussian process regression model(GPR),have been widely applied to analyze kinds of functional data.This paper introduces a composite of two T-process(CT),where the first one captures the smooth global trend and the second one models local details.TheCThas an advantage in the local variability compared to general T-process.Furthermore,a composite T-process regression(CTP)model is developed,based on the composite T-process.It inherits many nice properties as GPR,while it is more robust against outliers than GPR.Numerical studies including simulation and real data application show that CTP performs well in prediction.展开更多
Understanding the factors influencing the distribution of plant species is crucial for enhancing the management of endangered ecosystems. This study investigated the response of Hedysarum criniferum Boiss, an endemic ...Understanding the factors influencing the distribution of plant species is crucial for enhancing the management of endangered ecosystems. This study investigated the response of Hedysarum criniferum Boiss, an endemic and endangered species to 25 environmental variables within its habitats with an area of 2.95×10^(5) km^(2) in arid and semi-arid rangelands of Iran. The purpose of this research is to identify the key environmental factors affecting the distribution and habitat preferences of H. criniferum for further conservation and restoration of the species. To predict the occurrence of H. criniferum and explore its relationship with environmental factors, we employed the best subset regression analysis, the hierarchical classification, and the extended Huisman-Olf-Fresco(eHOF) model. The results showed that four environmental variables, i.e., gravel content, pH, annual minimum temperature, and mean annual temperature showed significant correlations with the canopy cover of H. criniferum(P<0.05). The probability of H. criniferum occurrence increased with higher precipitation and elevation, while it decreased with higher mean annual temperature, annual minimum temperature, and gravel content. The species' response curves and their optimal values, as assessed by the eHOF model, indicated that the response to mean annual temperature, ranging from 12℃ to 16℃, was optimal at 13℃. The response to mean annual precipitation, within a range of 150–650 mm, was optimal at 650 mm. Elevation responses, spanning from 1546 to 2450 m, showed an optimum at 2450 m. Regarding soil characteristics, the response to gravel content, ranging from 13.0%–48.0%, demonstrated an optimal value at 20.0%. The pH levels, varying from 7.5 to 8.2, prompted a sine-shaped response with an optimal pH of 8.0. These findings provide valuable insights for predicting species occurrence and identifying suitable locations for restoration programs. Our study underscores the importance of considering multiple environmental variables in habitat suitability assessments. By incorporating these broader considerations, we can further refine predictive models and enhance conservation efforts aimed at restoring habitats conducive to the luxuriance of endangered species like H. criniferum.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
利用江南地区77个台站的日降水资料及NCEP/NCAR再分析资料,基于不同时间尺度的江南地区降水低频分量和东亚地区850 h Pa低频经向风主成分,建立了多变量时滞回归(Multivariable Lagged Regression,MLR)模型,并对2011年5—7月江南降水低...利用江南地区77个台站的日降水资料及NCEP/NCAR再分析资料,基于不同时间尺度的江南地区降水低频分量和东亚地区850 h Pa低频经向风主成分,建立了多变量时滞回归(Multivariable Lagged Regression,MLR)模型,并对2011年5—7月江南降水低频分量进行延伸期逐日预报试验。结果表明,50~70 d时间尺度的江南低频降水的平均预报技巧高达0.92,可准确预报持续性强降水过程和降水低频位相的正负转换。对利用2001—2012年资料分别构建的MLR模型的历史回报预测试验表明,在50~70 d振荡较强和正常的年份,模型能提前30 d做出初夏江南低频降水分量预报。模型结果也表明,850 h Pa低频经向风的发展和演变是影响初夏江南低频降水未来30 d变化的显著信号,可作为延伸期强降水预报的关键因子。展开更多
文摘The extended t-process regression model is developed to robustly model functional data with outlier functional curves.This paper applies Bayesian estimation to propose an estimation procedure for the model with independent errors.A Monte Carlo EM method is built to estimate parameters involved in the model.Simulation studies and real examples show the proposed method performs well against outliers.
文摘Purpose: To formulate and demonstrate methods for regression modeling of probabilities and dispersions for individual-patient longitudinal outcomes taking on discrete numeric values. Methods: Three alternatives for modeling of outcome probabilities are considered. Multinomial probabilities are based on different intercepts and slopes for probabilities of different outcome values. Ordinal probabilities are based on different intercepts and the same slope for probabilities of different outcome values. Censored Poisson probabilities are based on the same intercept and slope for probabilities of different outcome values. Parameters are estimated with extended linear mixed modeling maximizing a likelihood-like function based on the multivariate normal density that accounts for within-patient correlation. Formulas are provided for gradient vectors and Hessian matrices for estimating model parameters. The likelihood-like function is also used to compute cross-validation scores for alternative models and to control an adaptive modeling process for identifying possibly nonlinear functional relationships in predictors for probabilities and dispersions. Example analyses are provided of daily pain ratings for a cancer patient over a period of 97 days. Results: The censored Poisson approach is preferable for modeling these data, and presumably other data sets of this kind, because it generates a competitive model with fewer parameters in less time than the other two approaches. The generated probabilities for this model are distinctly nonlinear in time while the dispersions are distinctly nonconstant over time, demonstrating the need for adaptive modeling of such data. The analyses also address the dependence of these daily pain ratings on time and the daily numbers of pain flares. Probabilities and dispersions change differently over time for different numbers of pain flares. Conclusions: Adaptive modeling of daily pain ratings for individual cancer patients is an effective way to identify nonlinear relationships in time as well as in other predictors such as the number of pain flares.
文摘Building high confidence regression test suites to validate new system versions is a challenging problem. A modelbased approach to build a regression test suite from a given test suite is described. The generated test suite includes every test that will traverse a change performed to produce the new version, and consists of only such tests to reduce the testing costs. Finite state machines extended with typed variables (EFSMs) are used to model systems and system changes are mapped to EFSM transition changes adding/deleting/replacing EFSM transitions and states. Tests are a sequence of input and expected output messages with concrete parameter values over the supported data types. An invariant is formulated to characterize tests whose runtime behavior can be accurately predicted by analyzing their descriptions along with the model. Incremental procedures to efficiently evaluate the invariant and to select tests for regression are developed. Overlaps among the test descriptions are exploited to extend the approach to simultaneously select multiple tests to reduce the test selection costs. Effectiveness of the approach is demonstrated by applying it to several protocols, Web services, and model programs extracted from a popular testing benchmark. Our experimental results show that the proposed approach is economical for regression test selection in all these examples. For all these examples, the proposed approach is able to identify all tests exercising changes more efficiently than brute-force symbolic evaluation.
基金supported by National Natural Science Foundation of China(Grant No.11971457)Anhui Provincial Natural Science Foundation(Grant No.1908085MA06).
文摘Process regression models,such as Gaussian process regression model(GPR),have been widely applied to analyze kinds of functional data.This paper introduces a composite of two T-process(CT),where the first one captures the smooth global trend and the second one models local details.TheCThas an advantage in the local variability compared to general T-process.Furthermore,a composite T-process regression(CTP)model is developed,based on the composite T-process.It inherits many nice properties as GPR,while it is more robust against outliers than GPR.Numerical studies including simulation and real data application show that CTP performs well in prediction.
基金funded by the Isfahan University of Technology,Iran.
文摘Understanding the factors influencing the distribution of plant species is crucial for enhancing the management of endangered ecosystems. This study investigated the response of Hedysarum criniferum Boiss, an endemic and endangered species to 25 environmental variables within its habitats with an area of 2.95×10^(5) km^(2) in arid and semi-arid rangelands of Iran. The purpose of this research is to identify the key environmental factors affecting the distribution and habitat preferences of H. criniferum for further conservation and restoration of the species. To predict the occurrence of H. criniferum and explore its relationship with environmental factors, we employed the best subset regression analysis, the hierarchical classification, and the extended Huisman-Olf-Fresco(eHOF) model. The results showed that four environmental variables, i.e., gravel content, pH, annual minimum temperature, and mean annual temperature showed significant correlations with the canopy cover of H. criniferum(P<0.05). The probability of H. criniferum occurrence increased with higher precipitation and elevation, while it decreased with higher mean annual temperature, annual minimum temperature, and gravel content. The species' response curves and their optimal values, as assessed by the eHOF model, indicated that the response to mean annual temperature, ranging from 12℃ to 16℃, was optimal at 13℃. The response to mean annual precipitation, within a range of 150–650 mm, was optimal at 650 mm. Elevation responses, spanning from 1546 to 2450 m, showed an optimum at 2450 m. Regarding soil characteristics, the response to gravel content, ranging from 13.0%–48.0%, demonstrated an optimal value at 20.0%. The pH levels, varying from 7.5 to 8.2, prompted a sine-shaped response with an optimal pH of 8.0. These findings provide valuable insights for predicting species occurrence and identifying suitable locations for restoration programs. Our study underscores the importance of considering multiple environmental variables in habitat suitability assessments. By incorporating these broader considerations, we can further refine predictive models and enhance conservation efforts aimed at restoring habitats conducive to the luxuriance of endangered species like H. criniferum.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.