Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent s...Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.展开更多
Quantitative headspace analysis of volatiles emitted by plants or any other living organisms in chemical ecology studies generates large multidimensional data that require extensive mining and refining to extract usef...Quantitative headspace analysis of volatiles emitted by plants or any other living organisms in chemical ecology studies generates large multidimensional data that require extensive mining and refining to extract useful information. More often the number of variables and the quantified volatile compounds exceed the number of observations or samples and hence many traditional statistical analysis methods become inefficient. Here, we employed machine learning algorithm, random forest (RF) in combination with distance-based procedure, similarity percentage (SIMPER) as preprocessing steps to reduce the data dimensionality in the chemical profiles of volatiles from three African nightshade plant species before subjecting the data to non-metric multidimensional scaling (NMDS). In addition, non-parametric methods namely permutational multivariate analysis of variance (PERMANOVA) and analysis of similarities (ANOSIM) were applied to test hypothesis of differences among the African nightshade species based on the volatiles profiles and ascertain the patterns revealed by NMDS plots. Our results revealed that there were significant differences among the African nightshade species when the data’s dimension was reduced using RF variable importance and SIMPER, as also supported by NMDS plots that showed S. scabrum being separated from S. villosum and S. sarrachoides based on the reduced data variables. The novelty of our work is on the merits of using data reduction techniques to successfully reveal differences in groups which could have otherwise not been the case if the analysis were performed on the entire original data matrix characterized by small samples. The R code used in the analysis has been shared herein for interested researchers to customise it for their own data of similar nature.展开更多
Survival analysis is a fundamental tool in medical science for time-to-event data. However, its application to colony organisms like bees poses challenges due to their social nature. Traditional survival models may no...Survival analysis is a fundamental tool in medical science for time-to-event data. However, its application to colony organisms like bees poses challenges due to their social nature. Traditional survival models may not accurately capture the interdependence among individuals within a colony. Frailty models, accounting for shared risks within groups, offer a promising alternative. This study evaluates the performance of semi-parametric shared frailty models (gamma, inverse normal, and positive stable-in comparison to the traditional Cox model using bees’ survival data). We examined the effect of misspecification of the frailty distribution on regression and heterogeneity parameters using simulation and concluded that the heterogeneity parameter was more sensitive to misspecification of the frailty distribution and choice of initial parameters (cluster size and true heterogeneity parameter) compared to the regression parameter. From the data, parameter estimates for covariates were close for the four models but slightly higher for the Cox model. The shared gamma frailty model provided a better fit to the data in comparison with the other models. Therefore, when focusing on regression parameters, the gamma frailty model is recommended. This research underscores the importance of tailored survival methodologies for accurately analyzing time-to-event data in social organisms.展开更多
Dose-response studies in arthropod research usually involve observing and collecting successive information at different times on the same group of insects exposed to different concentrations of stimulus. When the sam...Dose-response studies in arthropod research usually involve observing and collecting successive information at different times on the same group of insects exposed to different concentrations of stimulus. When the same measure is collected repeatedly over time, the data become correlated and Probit Analysis technique which is the standard method in analyzing bioassay experiments data cannot be used. Lethal time is estimated when the speed of kill is of interest since mortality varies over time. We evaluate a complementary approach, repeated measures logistic regression using Generalized Estimating Equations (GEE), for lethal time determination in mosquito dose response. Mortality data from anopheles larva exposed to 3 botanical extracts (B,C,E) at 4 concentration levels: 500 mg/ml, 250 mg/ml, 50 mg/ml and 12.5 mg/ml were used. The result shows the estimated LT50 values with concentration 500 mg/ml being the most virulent chemical for extract B (LT50 = 10.3 hrs), C (LT50 = 7.2 hrs) and E (LT50 = 10.3 hrs). The least virulent chemical was concentration 12.5 mg/ml for extract B (LT50 = 52.1 hrs), C (LT50 = 70.7 hrs) and E (LT50 = 55.0 hrs). We conclude that repeated measures of logistic regression via GEE can be used as a tool to estimate LT50 more effectively in repeated measures of arthropod data.展开更多
文摘Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.
文摘Quantitative headspace analysis of volatiles emitted by plants or any other living organisms in chemical ecology studies generates large multidimensional data that require extensive mining and refining to extract useful information. More often the number of variables and the quantified volatile compounds exceed the number of observations or samples and hence many traditional statistical analysis methods become inefficient. Here, we employed machine learning algorithm, random forest (RF) in combination with distance-based procedure, similarity percentage (SIMPER) as preprocessing steps to reduce the data dimensionality in the chemical profiles of volatiles from three African nightshade plant species before subjecting the data to non-metric multidimensional scaling (NMDS). In addition, non-parametric methods namely permutational multivariate analysis of variance (PERMANOVA) and analysis of similarities (ANOSIM) were applied to test hypothesis of differences among the African nightshade species based on the volatiles profiles and ascertain the patterns revealed by NMDS plots. Our results revealed that there were significant differences among the African nightshade species when the data’s dimension was reduced using RF variable importance and SIMPER, as also supported by NMDS plots that showed S. scabrum being separated from S. villosum and S. sarrachoides based on the reduced data variables. The novelty of our work is on the merits of using data reduction techniques to successfully reveal differences in groups which could have otherwise not been the case if the analysis were performed on the entire original data matrix characterized by small samples. The R code used in the analysis has been shared herein for interested researchers to customise it for their own data of similar nature.
文摘Survival analysis is a fundamental tool in medical science for time-to-event data. However, its application to colony organisms like bees poses challenges due to their social nature. Traditional survival models may not accurately capture the interdependence among individuals within a colony. Frailty models, accounting for shared risks within groups, offer a promising alternative. This study evaluates the performance of semi-parametric shared frailty models (gamma, inverse normal, and positive stable-in comparison to the traditional Cox model using bees’ survival data). We examined the effect of misspecification of the frailty distribution on regression and heterogeneity parameters using simulation and concluded that the heterogeneity parameter was more sensitive to misspecification of the frailty distribution and choice of initial parameters (cluster size and true heterogeneity parameter) compared to the regression parameter. From the data, parameter estimates for covariates were close for the four models but slightly higher for the Cox model. The shared gamma frailty model provided a better fit to the data in comparison with the other models. Therefore, when focusing on regression parameters, the gamma frailty model is recommended. This research underscores the importance of tailored survival methodologies for accurately analyzing time-to-event data in social organisms.
文摘Dose-response studies in arthropod research usually involve observing and collecting successive information at different times on the same group of insects exposed to different concentrations of stimulus. When the same measure is collected repeatedly over time, the data become correlated and Probit Analysis technique which is the standard method in analyzing bioassay experiments data cannot be used. Lethal time is estimated when the speed of kill is of interest since mortality varies over time. We evaluate a complementary approach, repeated measures logistic regression using Generalized Estimating Equations (GEE), for lethal time determination in mosquito dose response. Mortality data from anopheles larva exposed to 3 botanical extracts (B,C,E) at 4 concentration levels: 500 mg/ml, 250 mg/ml, 50 mg/ml and 12.5 mg/ml were used. The result shows the estimated LT50 values with concentration 500 mg/ml being the most virulent chemical for extract B (LT50 = 10.3 hrs), C (LT50 = 7.2 hrs) and E (LT50 = 10.3 hrs). The least virulent chemical was concentration 12.5 mg/ml for extract B (LT50 = 52.1 hrs), C (LT50 = 70.7 hrs) and E (LT50 = 55.0 hrs). We conclude that repeated measures of logistic regression via GEE can be used as a tool to estimate LT50 more effectively in repeated measures of arthropod data.