In this paper, a new distribution called Marshall-Olkin Exponentiated Fréchet distribution (MOEFr) is proposed. The goal is to increase the flexibility of the existing Exponentiated Fréchet distribution by i...In this paper, a new distribution called Marshall-Olkin Exponentiated Fréchet distribution (MOEFr) is proposed. The goal is to increase the flexibility of the existing Exponentiated Fréchet distribution by including an extra shape parameter, resulting into a more flexible distribution that can provide a better fit to various data sets than the baseline distribution. A generator method introduced by Marshall and Olkin is used to develop the new distribution. Some properties of the new distribution such as hazard rate function, survival function, reversed hazard rate function, cumulative hazard function, odds function, quantile function, moments and order statistics are derived. The maximum likelihood estimation is used to estimate the model parameters. Monte Carlo simulation is used to evaluate the behavior of the estimators through the average bias and root mean squared error. The new distribution is fitted and compared with some existing distributions such as the Exponentiated Fréchet (EFr), Marshall-Olkin Fréchet (MOFr), Beta Exponential Fréchet (BEFr), Beta Fréchet (BFr) and Fréchet (Fr) distributions, on three data sets, namely Bladder cancer, Carbone and Wheaton River data sets. Based on the goodness-of-fit statistics and information criteria values, it is demonstrated that the new distribution provides a better fit for the three data sets than the other distributions considered in the study.展开更多
Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
文摘In this paper, a new distribution called Marshall-Olkin Exponentiated Fréchet distribution (MOEFr) is proposed. The goal is to increase the flexibility of the existing Exponentiated Fréchet distribution by including an extra shape parameter, resulting into a more flexible distribution that can provide a better fit to various data sets than the baseline distribution. A generator method introduced by Marshall and Olkin is used to develop the new distribution. Some properties of the new distribution such as hazard rate function, survival function, reversed hazard rate function, cumulative hazard function, odds function, quantile function, moments and order statistics are derived. The maximum likelihood estimation is used to estimate the model parameters. Monte Carlo simulation is used to evaluate the behavior of the estimators through the average bias and root mean squared error. The new distribution is fitted and compared with some existing distributions such as the Exponentiated Fréchet (EFr), Marshall-Olkin Fréchet (MOFr), Beta Exponential Fréchet (BEFr), Beta Fréchet (BFr) and Fréchet (Fr) distributions, on three data sets, namely Bladder cancer, Carbone and Wheaton River data sets. Based on the goodness-of-fit statistics and information criteria values, it is demonstrated that the new distribution provides a better fit for the three data sets than the other distributions considered in the study.
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.