The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The r...The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The response variables were the area burned by lightning- caused fire, human-caused fire, and total burned area. The predictor variables were nine climate variables collected from the local weather station. Three regression models were utilized, including multiple linear regression, log- linear model (log-transformation on both response and predictor variables), and gamma-generalized linear model. The goodness-of-fit of the models were compared based on model fitting statistics such as R2, AIC, and RMSE. The results revealed that the gamma-generalized linear model was generally superior to both multiple linear regressionmodel and log-linear model for fitting the fire data. Further, the best models were selected based on the criteria that the climate variables were statistically significant at at = 0.05. The gamma best models indicated that maximum wind speed, precipitation, and days that rainfall greater than 0.1 mm had significant impacts on the area burned by the lightning-caused fire, while the mean temperature and minimum relative humidity were the .main drivers of the burned area caused by human activities. Overall, the total burned area by forest fire was significantly influenced by days that rainfall greater than 0.1 mm and minimum rela- tive humidity, indicating that the moisture condition of forest stands determine the burned area by forest fire.展开更多
This paper discusses the blind carrier frequency offset (CFO) estimation for orthogonal frequency division multiplexing (OFDM) systems by utilizing trilinear decomposition and genera- lized preceding. Firstly, the...This paper discusses the blind carrier frequency offset (CFO) estimation for orthogonal frequency division multiplexing (OFDM) systems by utilizing trilinear decomposition and genera- lized preceding. Firstly, the generalized precoding is employed to obtain multiple covariance matrices which are requisite for the trilinear model, and then a novel CFO estimation algorithm is proposed for the OFDM system. Compared with both the joint diagonalizer and estimation of signal parameters via rotational invariant technique (ESPRIT), the proposed algorithm enjoys a better CFO estimation performance. Furthermore, the proposed algorithm can work well without virtual carriers. Simulation results illustrate the performance of this algorithm,展开更多
We study the quasi likelihood equation in Generalized Linear Models(GLM) with adaptive design ∑(i=1)^n xi(yi-h(x'iβ))=0, where yi is a q=vector, and xi is a p×q random matrix. Under some assumptions, i...We study the quasi likelihood equation in Generalized Linear Models(GLM) with adaptive design ∑(i=1)^n xi(yi-h(x'iβ))=0, where yi is a q=vector, and xi is a p×q random matrix. Under some assumptions, it is shown that the Quasi- Likelihood equation for the GLM has a solution which is asymptotic normal.展开更多
This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allo...This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allowing exploration of the nonlinear relationship between a certain covariate and the response function. Asymptotic properties of the proposed sieve MLEs are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. Moreover, the estimators of the unknown parameters are asymptotically normal and efficient, and the estimator of the nonparametric function has an optimal convergence rate.展开更多
In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the...In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the use of some statistical tests mostly before carrying out the Analysis of Variance (ANOVA) technique. Many academic researchers have published series of papers (articles) on some tests for detecting variance heterogeneity assumption in multiple linear regression models. So many comparisons on these tests have been made using various statistical techniques like biases, error rates as well as powers. Aside comparisons, modifications of some of these statistical tests for detecting variance heterogeneity have been reported in some literatures in recent years. In a multiple linear regression situation, much work has not been done on comparing some selected statistical tests for homoscedasticity assumption when linear, quadratic, square root, and exponential forms of heteroscedasticity are injected into the residuals. As a result of this fact, the present study intends to work extensively on all these areas of interest with a view to filling the gap. The paper aims at providing a comprehensive comparative analysis of asymptotic behaviour of some selected statistical tests for homoscedasticity assumption in order to hunt for the best statistical test for detecting heteroscedasticity in a multiple linear regression scenario with varying variances and levels of significance. In the literature, several tests for homoscedasticity are available but only nine: Breusch-Godfrey test, studentized Breusch-Pagan test, White’s test, Nonconstant Variance Score test, Park test, Spearman Rank, <span>Glejser test, Goldfeld-Quandt test, Harrison-McCabe test were considered for this study;this is with a view to examining, by Monte Carlo simulations, their</span><span> asymptotic behaviours. However, four different forms of heteroscedastic structures: exponential and linear (generalize of square-root and quadratic structures) were injected into the residual part of the multiple linear regression models at different categories of sample sizes: 30, 50, 100, 200, 500 and 1000. Evaluations of the performances were done within R environment. Among other findings, our investigations revealed that Glejser and Park tests returned the best test to employ to check for heteroscedasticity in EHS and LHS respectively also White and Harrison-McCabe tests returned the best test to employ to check for homoscedasticity in EHS and LHS respectively for sample size less than 50.</span>展开更多
Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has of...Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has often been documented when it comes to the impact of climate change. It has become a significant concern, especially for the Malaysian health authorities, due to its rapid spread and serious effects, leading to loss of life. Several statistical models were performed to identify climatic factors associated with infectious diseases. However, because of the complex and nonlinear interactions between climate variables and disease components, modelling their relationships have become the main challenge in climate-health studies. Hence, this study proposed a Generalized Linear Model (GLM) via Poisson and Negative Binomial to examine the effects of the climate factors on dengue incidence by considering the collinearity between variables. This study focuses on the dengue hot spots in Malaysia for the year 2014. Since there exists collinearity between climate factors, the analysis was done separately using three different models. The study revealed that rainfall, temperature, humidity, and wind speed were statistically significant with dengue incidence, and most of them shown a negative effect. Of all variables, wind speed has the most significant impact on dengue incidence. Having this kind of relationships, policymakers should formulate better plans such that precautionary steps can be taken to reduce the spread of dengue diseases.展开更多
In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood e...In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood estimation is considered. Three diagnostic statistics are used to detect whether the outliers exist in the data set. Simulation results show that when the sample size is small, the values of diagnostic statistics based on the maximum Lq-likelihood estimation are greater than the values based on the maximum likelihood estimation. As the sample size increases, the difference between the values of the diagnostic statistics based on two estimation methods diminishes gradually. It means that the outliers can be distinguished easier through the maximum Lq-likelihood method than those through the maximum likelihood estimation method.展开更多
Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream p...Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.展开更多
Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Method...Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.展开更多
In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is ...In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is applied for generalized linear models and considering the model selection information criterion, Akaike information criterion and Bayesian information criterion, generalized Poisson regression model has been selected as a suitable model for the study. The objective of this study is to determine the parameters (a and b values), estimate the probability of an earthquake occurrence and its return period using a Poisson regression model and compared with the Gutenberg-Richter model. The study suggests that the probabilities of earthquake occurrences and return periods estimated by both the models are relatively close to each other. The return periods from the generalized Poisson regression model are comparatively smaller than the Gutenberg-Richter model.展开更多
In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a...In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a generalized empirical likelihood ratios function is defined, which integrates the within-cluster?correlation meanwhile avoids direct estimating the nuisance parameters in the correlation matrix. We show that the proposed statistics are asymptotically?Chi-squared under some suitable conditions, and hence it can be used to construct the confidence region of parameters. In addition, the maximum empirical likelihood estimates of parameters and the corresponding asymptotic normality are obtained. Simulation studies demonstrate the performance of the proposed method.展开更多
BACKGROUND Patients with chronic obstructive pulmonary disease(COPD)frequently experience exacerbations requiring multiple hospitalizations over prolonged disease courses,which predispose them to generalized anxiety d...BACKGROUND Patients with chronic obstructive pulmonary disease(COPD)frequently experience exacerbations requiring multiple hospitalizations over prolonged disease courses,which predispose them to generalized anxiety disorder(GAD).This comorbidity exacerbates breathing difficulties,activity limitations,and social isolation.While previous studies predominantly employed the GAD 7-item scale for screening,this approach is somewhat subjective.The current literature on predictive models for GAD risk in patients with COPD is limited.AIM To construct and validate a GAD risk prediction model to aid healthcare professionals in preventing the onset of GAD.METHODS This retrospective analysis encompassed patients with COPD treated at our institution from July 2021 to February 2024.The patients were categorized into a modeling(MO)group and a validation(VA)group in a 7:3 ratio on the basis of the occurrence of GAD.Univariate and multivariate logistic regression analyses were utilized to construct the risk prediction model,which was visualized using forest plots.The model’s performance was evaluated using Hosmer-Lemeshow(H-L)goodness-of-fit test and receiver operating characteristic(ROC)curve analysis.RESULTS A total of 271 subjects were included,with 190 in the MO group and 81 in the VA group.GAD was identified in 67 patients with COPD,resulting in a prevalence rate of 24.72%(67/271),with 49 cases(18.08%)in the MO group and 18 cases(22.22%)in the VA group.Significant differences were observed between patients with and without GAD in terms of educational level,average household income,smoking history,smoking index,number of exacerbations in the past year,cardiovascular comorbidities,disease knowledge,and personality traits(P<0.05).Multivariate logistic regression analysis revealed that lower education levels,household income<3000 China yuan,smoking history,smoking index≥400 cigarettes/year,≥two exacerbations in the past year,cardiovascular comorbidities,complete lack of disease information,and introverted personality were significant risk factors for GAD in the MO group(P<0.05).ROC analysis indicated that the area under the curve for predicting GAD in the MO and VA groups was 0.978 and 0.960.The H-L test yieldedχ^(2) values of 6.511 and 5.179,with P=0.275 and 0.274.Calibration curves demonstrated good agreement between predicted and actual GAD occurrence risks.CONCLUSION The developed predictive model includes eight independent risk factors:Educational level,household income,smoking history,smoking index,number of exacerbations in the past year,presence of cardiovascular comorbidities,level of disease knowledge,and personality traits.This model effectively predicts the onset of GAD in patients with COPD,enabling early identification of high-risk individuals and providing a basis for early preventive interventions by nursing staff.展开更多
This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation ...This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation to enhance its capabilities.The updated distribution is very adaptable and may be efficiently used in the modeling of survival data and dependability issues.The suggested model incorporates a hazard rate function(HRF)that may display a rising,J-shaped,or bathtub form,depending on its unique characteristics.This model includes many well-known lifespan distributions as separate sub-models.The suggested model is accompanied with a range of statistical features.The model parameters are examined using the techniques of maximum likelihood and Bayesian estimation using progressively censored data.In order to evaluate the effectiveness of these techniques,we provide a set of simulated data for testing purposes.The relevance of the newly presented model is shown via two real-world dataset applications,highlighting its superiority over other respected similar models.展开更多
The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.W...The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.展开更多
The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the se...The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the setting with highly correlated covariates.In this paper,the semi-standard partial covariance(SPAC)method with Lasso penalty is proposed to study the generalized linear model with highly correlated covariates,and the consistencies of the estimation and variable selection are shown in high-dimensional settings under some regularity conditions.Some simulation studies and an analysis of colon tumor dataset are carried out to show that the proposed method performs better in addressing highly correlated problem than the traditional penalized variable selection methods.展开更多
Analytical thermal traveling-wave distribution in biological tissues through a bio-heat transfer (BHT) model with linear/quadratic temperature-dependent blood perfusion is discussed in this paper. Using the extended g...Analytical thermal traveling-wave distribution in biological tissues through a bio-heat transfer (BHT) model with linear/quadratic temperature-dependent blood perfusion is discussed in this paper. Using the extended generalized Riccati equation mapping method, we find analytical traveling wave solutions of the considered BHT equation. All the travelling wave solutions obtained have been used to explicitly investigate the effect of linear and quadratic coefficients of temperature dependence on temperature distribution in tissues. We found that the parameter of the nonlinear superposition formula for Riccati can be used to control the temperature of living tissues. Our results prove that the extended generalized Riccati equation mapping method is a powerful tool for investigating thermal traveling-wave distribution in biological tissues.展开更多
On the basis of the nonlinear stability theorem in the context of Arnol'd's second theorem for the generalized Phillips model,nonlinear saturation of baroclinic instability in the generalized Phillips model is...On the basis of the nonlinear stability theorem in the context of Arnol'd's second theorem for the generalized Phillips model,nonlinear saturation of baroclinic instability in the generalized Phillips model is investigatedThe lower bound on the disturbance energy and potential enstrophy to the nonlinearly unstable basic flow in the generalized Phillips model is presented,which indicates that there may exist an allocation between a nonlinearly unstable basic flow and a growing disturbance展开更多
On the basis of the nonlinear stability theorem in the context of Arnol's second theorem for the generalized Phillips model, nonlinear saturation of baroclinic instability in the generalized Phillips model is inve...On the basis of the nonlinear stability theorem in the context of Arnol's second theorem for the generalized Phillips model, nonlinear saturation of baroclinic instability in the generalized Phillips model is investigated. By choosing appropriate artificial stable basic flows, the upper bounds on the disturbance energy and potential enstrophy to the nonlinearly unstable basic flow in the generalized Phillips model are obtained, which are analytic completely and without the limitation of infinitesimal initial disturbance.展开更多
Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,pre...Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.展开更多
Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and varianc...Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and variance-covariance identity matrix. In this paper, we propose to release random effects to non-normal distributions and discuss how to model the mean and covariance structures in GLMMs simultaneously. Parameter estimation is solved by using Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm very well in terms of accuracy and stabilization, which is demonstrated by real binary salamander mating data analysis and simulation studies.展开更多
基金funded by Asia-Pacific Forests Net(APFNET/2010/FPF/001)National Natural Science Foundation of China(Grant No.31400552)Forestry industry research special funds for public welfare projects(201404402)
文摘The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The response variables were the area burned by lightning- caused fire, human-caused fire, and total burned area. The predictor variables were nine climate variables collected from the local weather station. Three regression models were utilized, including multiple linear regression, log- linear model (log-transformation on both response and predictor variables), and gamma-generalized linear model. The goodness-of-fit of the models were compared based on model fitting statistics such as R2, AIC, and RMSE. The results revealed that the gamma-generalized linear model was generally superior to both multiple linear regressionmodel and log-linear model for fitting the fire data. Further, the best models were selected based on the criteria that the climate variables were statistically significant at at = 0.05. The gamma best models indicated that maximum wind speed, precipitation, and days that rainfall greater than 0.1 mm had significant impacts on the area burned by the lightning-caused fire, while the mean temperature and minimum relative humidity were the .main drivers of the burned area caused by human activities. Overall, the total burned area by forest fire was significantly influenced by days that rainfall greater than 0.1 mm and minimum rela- tive humidity, indicating that the moisture condition of forest stands determine the burned area by forest fire.
基金supported by the National Natural Science Foundation of China (60801052)the Aeronautical Science Foundation of China(2009ZC52036)+1 种基金Nanjing University of Aeronautics and Astronautics Research Funding (NS2012010 NP2011036)
文摘This paper discusses the blind carrier frequency offset (CFO) estimation for orthogonal frequency division multiplexing (OFDM) systems by utilizing trilinear decomposition and genera- lized preceding. Firstly, the generalized precoding is employed to obtain multiple covariance matrices which are requisite for the trilinear model, and then a novel CFO estimation algorithm is proposed for the OFDM system. Compared with both the joint diagonalizer and estimation of signal parameters via rotational invariant technique (ESPRIT), the proposed algorithm enjoys a better CFO estimation performance. Furthermore, the proposed algorithm can work well without virtual carriers. Simulation results illustrate the performance of this algorithm,
文摘We study the quasi likelihood equation in Generalized Linear Models(GLM) with adaptive design ∑(i=1)^n xi(yi-h(x'iβ))=0, where yi is a q=vector, and xi is a p×q random matrix. Under some assumptions, it is shown that the Quasi- Likelihood equation for the GLM has a solution which is asymptotic normal.
基金The talent research fund launched (3004-893325) of Dalian University of Technologythe NNSF (10271049) of China.
文摘This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allowing exploration of the nonlinear relationship between a certain covariate and the response function. Asymptotic properties of the proposed sieve MLEs are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. Moreover, the estimators of the unknown parameters are asymptotically normal and efficient, and the estimator of the nonparametric function has an optimal convergence rate.
文摘In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the use of some statistical tests mostly before carrying out the Analysis of Variance (ANOVA) technique. Many academic researchers have published series of papers (articles) on some tests for detecting variance heterogeneity assumption in multiple linear regression models. So many comparisons on these tests have been made using various statistical techniques like biases, error rates as well as powers. Aside comparisons, modifications of some of these statistical tests for detecting variance heterogeneity have been reported in some literatures in recent years. In a multiple linear regression situation, much work has not been done on comparing some selected statistical tests for homoscedasticity assumption when linear, quadratic, square root, and exponential forms of heteroscedasticity are injected into the residuals. As a result of this fact, the present study intends to work extensively on all these areas of interest with a view to filling the gap. The paper aims at providing a comprehensive comparative analysis of asymptotic behaviour of some selected statistical tests for homoscedasticity assumption in order to hunt for the best statistical test for detecting heteroscedasticity in a multiple linear regression scenario with varying variances and levels of significance. In the literature, several tests for homoscedasticity are available but only nine: Breusch-Godfrey test, studentized Breusch-Pagan test, White’s test, Nonconstant Variance Score test, Park test, Spearman Rank, <span>Glejser test, Goldfeld-Quandt test, Harrison-McCabe test were considered for this study;this is with a view to examining, by Monte Carlo simulations, their</span><span> asymptotic behaviours. However, four different forms of heteroscedastic structures: exponential and linear (generalize of square-root and quadratic structures) were injected into the residual part of the multiple linear regression models at different categories of sample sizes: 30, 50, 100, 200, 500 and 1000. Evaluations of the performances were done within R environment. Among other findings, our investigations revealed that Glejser and Park tests returned the best test to employ to check for heteroscedasticity in EHS and LHS respectively also White and Harrison-McCabe tests returned the best test to employ to check for homoscedasticity in EHS and LHS respectively for sample size less than 50.</span>
文摘Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has often been documented when it comes to the impact of climate change. It has become a significant concern, especially for the Malaysian health authorities, due to its rapid spread and serious effects, leading to loss of life. Several statistical models were performed to identify climatic factors associated with infectious diseases. However, because of the complex and nonlinear interactions between climate variables and disease components, modelling their relationships have become the main challenge in climate-health studies. Hence, this study proposed a Generalized Linear Model (GLM) via Poisson and Negative Binomial to examine the effects of the climate factors on dengue incidence by considering the collinearity between variables. This study focuses on the dengue hot spots in Malaysia for the year 2014. Since there exists collinearity between climate factors, the analysis was done separately using three different models. The study revealed that rainfall, temperature, humidity, and wind speed were statistically significant with dengue incidence, and most of them shown a negative effect. Of all variables, wind speed has the most significant impact on dengue incidence. Having this kind of relationships, policymakers should formulate better plans such that precautionary steps can be taken to reduce the spread of dengue diseases.
基金The National Natural Science Foundation of China(No.11171065)the Natural Science Foundation of Jiangsu Province(No.BK2011058)
文摘In order to detect whether the data conforms to the given model, it is necessary to diagnose the data in the statistical way. The diagnostic problem in generalized nonlinear models based on the maximum Lq-likelihood estimation is considered. Three diagnostic statistics are used to detect whether the outliers exist in the data set. Simulation results show that when the sample size is small, the values of diagnostic statistics based on the maximum Lq-likelihood estimation are greater than the values based on the maximum likelihood estimation. As the sample size increases, the difference between the values of the diagnostic statistics based on two estimation methods diminishes gradually. It means that the outliers can be distinguished easier through the maximum Lq-likelihood method than those through the maximum likelihood estimation method.
文摘Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.
文摘Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.
文摘In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is applied for generalized linear models and considering the model selection information criterion, Akaike information criterion and Bayesian information criterion, generalized Poisson regression model has been selected as a suitable model for the study. The objective of this study is to determine the parameters (a and b values), estimate the probability of an earthquake occurrence and its return period using a Poisson regression model and compared with the Gutenberg-Richter model. The study suggests that the probabilities of earthquake occurrences and return periods estimated by both the models are relatively close to each other. The return periods from the generalized Poisson regression model are comparatively smaller than the Gutenberg-Richter model.
文摘In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a generalized empirical likelihood ratios function is defined, which integrates the within-cluster?correlation meanwhile avoids direct estimating the nuisance parameters in the correlation matrix. We show that the proposed statistics are asymptotically?Chi-squared under some suitable conditions, and hence it can be used to construct the confidence region of parameters. In addition, the maximum empirical likelihood estimates of parameters and the corresponding asymptotic normality are obtained. Simulation studies demonstrate the performance of the proposed method.
基金Supported by the Henan Provincial Health Commission,No.232102310145.
文摘BACKGROUND Patients with chronic obstructive pulmonary disease(COPD)frequently experience exacerbations requiring multiple hospitalizations over prolonged disease courses,which predispose them to generalized anxiety disorder(GAD).This comorbidity exacerbates breathing difficulties,activity limitations,and social isolation.While previous studies predominantly employed the GAD 7-item scale for screening,this approach is somewhat subjective.The current literature on predictive models for GAD risk in patients with COPD is limited.AIM To construct and validate a GAD risk prediction model to aid healthcare professionals in preventing the onset of GAD.METHODS This retrospective analysis encompassed patients with COPD treated at our institution from July 2021 to February 2024.The patients were categorized into a modeling(MO)group and a validation(VA)group in a 7:3 ratio on the basis of the occurrence of GAD.Univariate and multivariate logistic regression analyses were utilized to construct the risk prediction model,which was visualized using forest plots.The model’s performance was evaluated using Hosmer-Lemeshow(H-L)goodness-of-fit test and receiver operating characteristic(ROC)curve analysis.RESULTS A total of 271 subjects were included,with 190 in the MO group and 81 in the VA group.GAD was identified in 67 patients with COPD,resulting in a prevalence rate of 24.72%(67/271),with 49 cases(18.08%)in the MO group and 18 cases(22.22%)in the VA group.Significant differences were observed between patients with and without GAD in terms of educational level,average household income,smoking history,smoking index,number of exacerbations in the past year,cardiovascular comorbidities,disease knowledge,and personality traits(P<0.05).Multivariate logistic regression analysis revealed that lower education levels,household income<3000 China yuan,smoking history,smoking index≥400 cigarettes/year,≥two exacerbations in the past year,cardiovascular comorbidities,complete lack of disease information,and introverted personality were significant risk factors for GAD in the MO group(P<0.05).ROC analysis indicated that the area under the curve for predicting GAD in the MO and VA groups was 0.978 and 0.960.The H-L test yieldedχ^(2) values of 6.511 and 5.179,with P=0.275 and 0.274.Calibration curves demonstrated good agreement between predicted and actual GAD occurrence risks.CONCLUSION The developed predictive model includes eight independent risk factors:Educational level,household income,smoking history,smoking index,number of exacerbations in the past year,presence of cardiovascular comorbidities,level of disease knowledge,and personality traits.This model effectively predicts the onset of GAD in patients with COPD,enabling early identification of high-risk individuals and providing a basis for early preventive interventions by nursing staff.
基金This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(Grant Number IMSIU-RG23142).
文摘This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation to enhance its capabilities.The updated distribution is very adaptable and may be efficiently used in the modeling of survival data and dependability issues.The suggested model incorporates a hazard rate function(HRF)that may display a rising,J-shaped,or bathtub form,depending on its unique characteristics.This model includes many well-known lifespan distributions as separate sub-models.The suggested model is accompanied with a range of statistical features.The model parameters are examined using the techniques of maximum likelihood and Bayesian estimation using progressively censored data.In order to evaluate the effectiveness of these techniques,we provide a set of simulated data for testing purposes.The relevance of the newly presented model is shown via two real-world dataset applications,highlighting its superiority over other respected similar models.
文摘The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.
基金Supported by the National Natural Science Foundation of China(Grant Nos.12001277,12271046 and 12131006)。
文摘The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the setting with highly correlated covariates.In this paper,the semi-standard partial covariance(SPAC)method with Lasso penalty is proposed to study the generalized linear model with highly correlated covariates,and the consistencies of the estimation and variable selection are shown in high-dimensional settings under some regularity conditions.Some simulation studies and an analysis of colon tumor dataset are carried out to show that the proposed method performs better in addressing highly correlated problem than the traditional penalized variable selection methods.
文摘Analytical thermal traveling-wave distribution in biological tissues through a bio-heat transfer (BHT) model with linear/quadratic temperature-dependent blood perfusion is discussed in this paper. Using the extended generalized Riccati equation mapping method, we find analytical traveling wave solutions of the considered BHT equation. All the travelling wave solutions obtained have been used to explicitly investigate the effect of linear and quadratic coefficients of temperature dependence on temperature distribution in tissues. We found that the parameter of the nonlinear superposition formula for Riccati can be used to control the temperature of living tissues. Our results prove that the extended generalized Riccati equation mapping method is a powerful tool for investigating thermal traveling-wave distribution in biological tissues.
文摘On the basis of the nonlinear stability theorem in the context of Arnol'd's second theorem for the generalized Phillips model,nonlinear saturation of baroclinic instability in the generalized Phillips model is investigatedThe lower bound on the disturbance energy and potential enstrophy to the nonlinearly unstable basic flow in the generalized Phillips model is presented,which indicates that there may exist an allocation between a nonlinearly unstable basic flow and a growing disturbance
文摘On the basis of the nonlinear stability theorem in the context of Arnol's second theorem for the generalized Phillips model, nonlinear saturation of baroclinic instability in the generalized Phillips model is investigated. By choosing appropriate artificial stable basic flows, the upper bounds on the disturbance energy and potential enstrophy to the nonlinearly unstable basic flow in the generalized Phillips model are obtained, which are analytic completely and without the limitation of infinitesimal initial disturbance.
基金supported in part by National Sciences Foundation of China grant ( 11672001)Jiangsu Province Science and Technology Agency grant ( BE2016785)supported in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province grant ( KYCX18_0156)
文摘Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.
文摘Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and variance-covariance identity matrix. In this paper, we propose to release random effects to non-normal distributions and discuss how to model the mean and covariance structures in GLMMs simultaneously. Parameter estimation is solved by using Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm very well in terms of accuracy and stabilization, which is demonstrated by real binary salamander mating data analysis and simulation studies.