BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced N...BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced NPC with the addition of chemotherapy to concomitant chemoradiotherapy.Therefore,precise prediction of metastasis in patients with NPC is crucial.AIM To develop a predictive model for metastasis in NPC using detailed magnetic resonance imaging(MRI)reports.METHODS This retrospective study included 792 patients with non-distant metastatic NPC.A total of 469 imaging variables were obtained from detailed MRI reports.Data were stratified and randomly split into training(50%)and testing sets.Gradient boosting tree(GBT)models were built and used to select variables for predicting DM.A full model comprising all variables and a reduced model with the top-five variables were built.Model performance was assessed by area under the curve(AUC).RESULTS Among the 792 patients,94 developed DM during follow-up.The number of metastatic cervical nodes(30.9%),tumor invasion in the posterior half of the nasal cavity(9.7%),two sides of the pharyngeal recess(6.2%),tubal torus(3.3%),and single side of the parapharyngeal space(2.7%)were the top-five contributors for predicting DM,based on their relative importance in GBT models.The testing AUC of the full model was 0.75(95%confidence interval[CI]:0.69-0.82).The testing AUC of the reduced model was 0.75(95%CI:0.68-0.82).For the whole dataset,the full(AUC=0.76,95%CI:0.72-0.82)and reduced models(AUC=0.76,95%CI:0.71-0.81)outperformed the tumor node-staging system(AUC=0.67,95%CI:0.61-0.73).CONCLUSION The GBT model outperformed the tumor node-staging system in predicting metastasis in NPC.The number of metastatic cervical nodes was identified as the principal contributing variable.展开更多
During the COVID-19 pandemic,the treatment of aortic dissection has faced additional challenges.The necessary medical resources are in serious shortage,and the preoperative waiting time has been significantly prolonge...During the COVID-19 pandemic,the treatment of aortic dissection has faced additional challenges.The necessary medical resources are in serious shortage,and the preoperative waiting time has been significantly prolonged due to the requirement to test for COVID-19 infection.In this work,we focus on the risk prediction of aortic dissection surgery under the influence of the COVID-19 pandemic.A general scheme of medical data processing is proposed,which includes five modules,namely problem definition,data preprocessing,data mining,result analysis,and knowledge application.Based on effective data preprocessing,feature analysis and boosting trees,our proposed fusion decision model can obtain 100%accuracy for early postoperative mortality prediction,which outperforms machine learning methods based on a single model such as LightGBM,XGBoost,and CatBoost.The results reveal the critical factors related to the postoperative mortality of aortic dissection,which can provide a theoretical basis for the formulation of clinical operation plans and help to effectively avoid risks in advance.展开更多
This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hie...This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hierarchical clustering method to quickly obtain the best features of employees to reach their best performance.The work of this paper consists of two parts.The first part is based on collecting data of employees to calculate and illustrate the performance of each employee.The second part is based on the classification and prediction techniques of the employee performance.This model is designed to help companies in their decisions about the employees’performance.The classification and prediction algorithms use the Gradient Boosting Tree classifier to classify and predict the features.Results of the paper give the percentage of employees which are expected to leave the company after predicting their performance for the coming years.Results also show that the Grasshopper Optimization,followed by“KF”with the Gradient Boosting Tree as classifier and predictor,is characterized by a high accuracy.The proposed algorithm is compared with other known techniques where our results are fund to be superior.展开更多
Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for st...Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from N...This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from NCHRP Report-547,the model was trained and rigorously tested.Performance metrics,specifically RMSE,MAE,and R2,were employed to assess the model's predictive accuracy,robustness,and generalisability.When benchmarked against well-established models like support vector machines(SVM)and gaussian process regression(GPR),the AHA-boosted model demonstrated enhanced performance.It achieved R2 values of 0.997 in training and 0.974 in testing,using the traditional Witczak NCHRP 1-40D model inputs.Incorporating features such as test temperature,frequency,and asphalt content led to a 1.23%increase in the test R2,signifying an improvement in the model's accuracy.The study also explored feature importance and sensitivity through SHAP and permutation importance plots,highlighting binder complex modulus|G*|as a key predictor.Although the AHA-boosted model shows promise,a slight decrease in R2 from training to testing indicates a need for further validation.Overall,this study confirms the AHA-boosted model as a highly accurate and robust tool for predicting the dynamic modulus of hot mix asphalt concrete,making it a valuable asset for pavement engineering.展开更多
To investigate the travel time prediction method of the freeway, a model based on the gradient boosting decision tree (GBDT) is proposed. Eleven variables (namely, travel time in current period T i , traffic flow in c...To investigate the travel time prediction method of the freeway, a model based on the gradient boosting decision tree (GBDT) is proposed. Eleven variables (namely, travel time in current period T i , traffic flow in current period Q i , speed in current period V i , density in current period K i , the number of vehicles in current period N i , occupancy in current period R i , traffic state parameter in current period X i , travel time in previous time period T i -1 , etc.) are selected to predict the travel time for 10 min ahead in the proposed model. Data obtained from VISSIM simulation is used to train and test the model. The results demonstrate that the prediction error of the GBDT model is smaller than those of the back propagation (BP) neural network model and the support vector machine (SVM) model. Travel time in current period T i is the most important variable among all variables in the GBDT model. The GBDT model can produce more accurate prediction results and mine the hidden nonlinear relationships deeply between variables and the predicted travel time.展开更多
Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges ...Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges in precisely defining SICLU and constructing comprehensive indicators,which has hindered the exploration of factors influencing LSO within the SICLU framework.To address this gap,we integrated self-efficacy theory into the design of an index framework for evaluating SICLU.We subsequently employed econometric models to analyze the significant factors that impact LSO.Our findings reveal that SICLU can be divided into four key dimensions:intensive management,efficient output,resource conservation,and ecological environment optimization.Furthermore,it is crucial to incorporate belief-based cognitive factors into the index system,as farmers’ understanding of fertilizer and pesticide application significantly influences their willingness to engage in LSO.Moreover,we identify grain market turnover as the most influential factor in promoting LSO,with single-factor contribution rates reaching 70.9% for cultivated land transfer willingness and 62.5% for the total planting areas.Interestingly,unlike irrigation and agricultural machinery inputs,increased labor inputs correspond to larger planting areas for farmers.This trend may be attributed to reduced labor availability because of rural labor migration,whereas the reduction in irrigation and agricultural input is contingent on innovations in production practices and the transfer of cultivated land management rights.Importantly,SICLU dynamically influences LSO,with each index related to SICLU having an optimal range that fosters LSO.These insights offer valuable guidance for policymakers,emphasizing farmers as their central focus,with the adjustment of input and output factors as a means to achieve LSO as the ultimate goal.In conclusion,we propose research avenues for further enriching the SICLU framework to ensure that it aligns with the specific characteristics of regional agricultural development.展开更多
The uniaxial compressive strength(UCS)of rock is an essential property of rock material in different relevant applications,such as rock slope,tunnel construction,and foundation.It takes enormous time and effort to obt...The uniaxial compressive strength(UCS)of rock is an essential property of rock material in different relevant applications,such as rock slope,tunnel construction,and foundation.It takes enormous time and effort to obtain the UCS values directly in the laboratory.Accordingly,an indirect determination of UCS through conducting several rock index tests that are easy and fast to carry out is of interest and importance.This study presents powerful boosting trees evaluation framework,i.e.,adaptive boosting machine,extreme gradient boosting machine(XGBoost),and category gradient boosting machine,for estimating the UCS of sandstone.Schmidt hammer rebound number,P-wave velocity,and point load index were chosen as considered factors to forecast UCS values of sandstone samples.Taylor diagrams and five regression metrics,including coefficient of determination(R2),root mean square error,mean absolute error,variance account for,and A-20 index,were used to evaluate and compare the performance of these boosting trees.The results showed that the proposed boosting trees are able to provide a high level of prediction capacity for the prepared database.In particular,itwas worth noting that XGBoost is the best model to predict sandstone strength and it achieved 0.999 training R^(2) and 0.958 testing R^(2).The proposed model had more outstanding capability than neural network with optimization techniques during training and testing phases.The performed variable importance analysis reveals that the point load index has a significant influence on predicting UCS of sandstone.展开更多
It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this wor...It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this work, we studied how to detect artificial pornographic pictures, especially when they are on social networks. The whole detection process can be divided into two stages: feature selection and picture detection. In the feature selection stage, seven types of features that favour picture detection were selected. In the picture detection stage, three steps were included. 1) In order to alleviate the imbalance in the number of artificial pornographic pictures and normal ones, the training dataset of artificial pornographic pictures was expanded. Therefore, the features which were extracted from the training dataset can also be expanded too. 2) In order to reduce the time of feature extraction, a fast method which extracted features based on the proportionally scaled picture rather than the original one was proposed. 3) Three tree models were compared and a gradient boost decision tree (GBDT) was selected for the final picture detection. Three sets of experimental results show that the proposed method can achieve better recognition precision and drastically reduce the time cost of the method.展开更多
In order to improve the accuracy of target intent recognition,a recognition method based on XGBoost(eXtreme Gradient Boosting)decision tree is proposed.This paper adopts relevant data and program of python to calculat...In order to improve the accuracy of target intent recognition,a recognition method based on XGBoost(eXtreme Gradient Boosting)decision tree is proposed.This paper adopts relevant data and program of python to calculate the probability of tactical intention.Then the sequence intention probability is obtained by applying Dempster-Shafer rule of combination.To verify the accuracy of recognition results,we compare the experimental results of this paper with the results in the literatures.The experiment shows that the probability of tactical intention recognition through this method is improved,so this method is feasible.展开更多
Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even de...Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even death. Curing epilepsy requires risky surgery. If not, the patient may be subjected to a long drug treatment associated with lifestyle advice without guarantee of total recovery. However, regardless of the type of treatment performed, late treatment necessarily creates psychological instability in the patient. It is therefore important to be able to diagnose the disease as early as possible if we desire that the patient does not suffer from its consequences on their mental health. That is why the study aims to propose a model for detecting epilepsy in order to be able to identify it as early as possible, especially in newborns. The objective of the article is to propose a model for detecting epilepsy using data from electroencephalogram signals from 10 newborns. This model developed using the extra trees classifier technique offers the possibility of predicting epilepsy in infants with an accuracy of around 99.4%.展开更多
The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requ...The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requirements of high accuracy for stability assessment of entry-type excavations.Therefore,this study proposes a new prediction method based on machine learning to scientifically adjust the critical span graph.Accordingly,the particle swarm optimization(PSO)algorithm is used to optimize the core parameters of the gradient boosting decision tree(GBDT),abbreviated as PSO-GBDT.Moreover,the classification performance of eight other classifiers including GDBT,k-nearest neighbors(KNN),two kinds of support vector machines(SVM),Gaussian naive Bayes(GNB),logistic regression(LR)and linear discriminant analysis(LDA)are also applied to compare with the proposed model.Findings revealed that compared with the other eight models,the prediction performance of PSO-GBDT is undoubtedly the most reliable,and its classification accuracy is up to 0.93.Therefore,this model has great potential to provide a more scientific and accurate choice for the stability prediction of underground excavations.In addition,each classification model is used to predict the stability category of several grid points divided by the critical span graph,and the updated critical span graph of each model is discussed in combination with previous studies.The results show that the PSO-GBDT model has the advantages of being scientific,accurate and efficient in updating the critical span graph,and its output decision boundary has strict theoretical support,which can help mine operators make favorable economic decisions.展开更多
The zero-degree calorimeter(ZDC)plays a crucial role toward determining the centrality in the Cooling-Storage-Ring External-target Experiment(CEE)at the Heavy Ion Research Facility in Lanzhou.A boosted decision tree(B...The zero-degree calorimeter(ZDC)plays a crucial role toward determining the centrality in the Cooling-Storage-Ring External-target Experiment(CEE)at the Heavy Ion Research Facility in Lanzhou.A boosted decision tree(BDT)multi-classification algorithm was employed to classify the centrality of the collision events based on the raw features from ZDC such as the number of fired channels and deposited energy.The data from simulated^(238)U+^(238)U collisions at 500 MeV∕u,generated by the IQMD event generator and subsequently modeled using the GEANT4 package,were employed to train and test the BDT model.The results showed the high accuracy of the multi-classification model adopted in ZDC for centrality determination,which is robust against variations in different factors of detector geometry and response.This study demon-strates the good performance of CEE-ZDC in determining the centrality in nucleus-nucleus collisions.展开更多
The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timel...The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.展开更多
The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of ...The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of fatigue failure.The fatigue life of high strength aluminum alloy 2090-T83 is predicted in this study using a variety of artificial intelligence and machine learning techniques for constant amplitude and negative stress ratios(R?1).Artificial neural networks(ANN),adaptive neuro-fuzzy inference systems(ANFIS),support-vector machines(SVM),a random forest model(RF),and an extreme-gradient tree-boosting model(XGB)are trained using numerical and experimental input data obtained from fatigue tests based on a relatively low number of stress measurements.In particular,the coefficients of the traditional force law formula are found using relevant numerical methods.It is shown that,in comparison to traditional approaches,the neural network and neuro-fuzzy models produce better results,with the neural network models trained using the boosting iterations technique providing the best performances.Building strong models from weak models,XGB helps to predict fatigue life by reducing model partiality and variation in supervised learning.Fuzzy neural models can be used to predict the fatigue life of alloys more accurately than neural networks and traditional methods.展开更多
This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient a...This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient and accurate rock trace identification.A thirteen-dimensional database consisting of basic,vector,and discontinuity features is established from image samples.All data points are classified as either‘‘trace”or‘‘non-trace”to divide the ultimate results into candidate trace samples.It is found that the SMOTE technology can effectively improve classification performance by recommending an optimized imbalance ratio of 1:5 to 1:4.Then,sixteen classifiers generated from four basic machine learning(ML)models are applied for performance comparison.The results reveal that the proposed RS-SMOTE-GBT classifier outperforms the other fifteen hybrid ML algorithms for both trace and nontrace classifications.Finally,discussions on feature importance,generalization ability and classification error are conducted for the proposed classifier.The experimental results indicate that more critical features affecting the trace classification are primarily from the discontinuity features.Besides,cleaning up the sedimentary pumice and reducing the area of fractured rock contribute to improving the overall classification performance.The proposed method provides a new alternative approach for the identification of 3D rock trace.展开更多
In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet g...In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet grouting(JG)technique(high-pressure grout mixed with coal particles)was first introduced in this study to improve the self-supporting ability of coal mass.To evaluate the strength of the jet-grouted coal-grout composite(JG composite),the UCS evolution patterns were analyzed by preparing 405 specimens combining the influential variables of grout types,curing time,and coal to grout(C/G)ratio.Furthermore,the relationships between UCS and these influencing variables were modeled using ensemble learning methods i.e.gradient boosted regression tree(GBRT)and random forest(RF)with their hyperparameters tuned by the particle swarm optimization(PSO).The results showed that the chemical grout composite has higher short-term strength,while the cement grout composite can achieve more stable strength in the long term.The PSO-GBRT and PSO-RF models can both achieve high prediction accuracy.Also,the variable importance analysis demonstrated that the grout type and curing time should be considered carefully.This study provides a robust intelligent model for predicting UCS of JG composites,which boosts JG design in the field.展开更多
Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abu...Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abundance to each environmental variable is different and habitat requirements may change over life history stages and seasons.Therefore,it is necessary to determine the optimal combination of environmental variables in HSI modelling.In this study,generalized additive models(GAMs)were used to determine which environmental variables to be included in the HSI models.Significant variables were retained and weighted in the HSI model according to their relative contribution(%)to the total deviation explained by the boosted regression tree(BRT).The HSI models were applied to evaluate the habitat suitability of mantis shrimp Oratosquilla oratoria in the Haizhou Bay and adjacent areas in 2011 and 2013–2017.Ontogenetic and seasonal variations in HSI models of mantis shrimp were also examined.Among the four models(non-optimized model,BRT informed HSI model,GAM informed HSI model,and both BRT and GAM informed HSI model),both BRT and GAM informed HSI model showed the best performance.Four environmental variables(bottom temperature,depth,distance offshore and sediment type)were selected in the HSI models for four groups(spring-juvenile,spring-adult,falljuvenile and fall-adult)of mantis shrimp.The distribution of habitat suitability showed similar patterns between juveniles and adults,but obvious seasonal variations were observed.This study suggests that the process of optimizing environmental variables in HSI models improves the performance of HSI models,and this optimization strategy could be extended to other marine organisms to enhance the understanding of the habitat suitability of target species.展开更多
Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technol...Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.展开更多
文摘BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced NPC with the addition of chemotherapy to concomitant chemoradiotherapy.Therefore,precise prediction of metastasis in patients with NPC is crucial.AIM To develop a predictive model for metastasis in NPC using detailed magnetic resonance imaging(MRI)reports.METHODS This retrospective study included 792 patients with non-distant metastatic NPC.A total of 469 imaging variables were obtained from detailed MRI reports.Data were stratified and randomly split into training(50%)and testing sets.Gradient boosting tree(GBT)models were built and used to select variables for predicting DM.A full model comprising all variables and a reduced model with the top-five variables were built.Model performance was assessed by area under the curve(AUC).RESULTS Among the 792 patients,94 developed DM during follow-up.The number of metastatic cervical nodes(30.9%),tumor invasion in the posterior half of the nasal cavity(9.7%),two sides of the pharyngeal recess(6.2%),tubal torus(3.3%),and single side of the parapharyngeal space(2.7%)were the top-five contributors for predicting DM,based on their relative importance in GBT models.The testing AUC of the full model was 0.75(95%confidence interval[CI]:0.69-0.82).The testing AUC of the reduced model was 0.75(95%CI:0.68-0.82).For the whole dataset,the full(AUC=0.76,95%CI:0.72-0.82)and reduced models(AUC=0.76,95%CI:0.71-0.81)outperformed the tumor node-staging system(AUC=0.67,95%CI:0.61-0.73).CONCLUSION The GBT model outperformed the tumor node-staging system in predicting metastasis in NPC.The number of metastatic cervical nodes was identified as the principal contributing variable.
基金This work was supported in part by the Key Research and Development Plan of Hunan Province under Grant 2019SK2022,author H.T,http://kjt.hunan.gov.cn/in part by the National Natural Science Foundation of Hunan under Grant 2019JJ50866,author L.T,and Grant 2020JJ4140,author Y.T,http://kjt.hunan.gov.cn/.
文摘During the COVID-19 pandemic,the treatment of aortic dissection has faced additional challenges.The necessary medical resources are in serious shortage,and the preoperative waiting time has been significantly prolonged due to the requirement to test for COVID-19 infection.In this work,we focus on the risk prediction of aortic dissection surgery under the influence of the COVID-19 pandemic.A general scheme of medical data processing is proposed,which includes five modules,namely problem definition,data preprocessing,data mining,result analysis,and knowledge application.Based on effective data preprocessing,feature analysis and boosting trees,our proposed fusion decision model can obtain 100%accuracy for early postoperative mortality prediction,which outperforms machine learning methods based on a single model such as LightGBM,XGBoost,and CatBoost.The results reveal the critical factors related to the postoperative mortality of aortic dissection,which can provide a theoretical basis for the formulation of clinical operation plans and help to effectively avoid risks in advance.
文摘This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hierarchical clustering method to quickly obtain the best features of employees to reach their best performance.The work of this paper consists of two parts.The first part is based on collecting data of employees to calculate and illustrate the performance of each employee.The second part is based on the classification and prediction techniques of the employee performance.This model is designed to help companies in their decisions about the employees’performance.The classification and prediction algorithms use the Gradient Boosting Tree classifier to classify and predict the features.Results of the paper give the percentage of employees which are expected to leave the company after predicting their performance for the coming years.Results also show that the Grasshopper Optimization,followed by“KF”with the Gradient Boosting Tree as classifier and predictor,is characterized by a high accuracy.The proposed algorithm is compared with other known techniques where our results are fund to be superior.
基金supported by the National Nat-ural Science Foundation of China(No.52203376)the National Key Research and Development Program of China(No.2023YFB3813200).
文摘Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from NCHRP Report-547,the model was trained and rigorously tested.Performance metrics,specifically RMSE,MAE,and R2,were employed to assess the model's predictive accuracy,robustness,and generalisability.When benchmarked against well-established models like support vector machines(SVM)and gaussian process regression(GPR),the AHA-boosted model demonstrated enhanced performance.It achieved R2 values of 0.997 in training and 0.974 in testing,using the traditional Witczak NCHRP 1-40D model inputs.Incorporating features such as test temperature,frequency,and asphalt content led to a 1.23%increase in the test R2,signifying an improvement in the model's accuracy.The study also explored feature importance and sensitivity through SHAP and permutation importance plots,highlighting binder complex modulus|G*|as a key predictor.Although the AHA-boosted model shows promise,a slight decrease in R2 from training to testing indicates a need for further validation.Overall,this study confirms the AHA-boosted model as a highly accurate and robust tool for predicting the dynamic modulus of hot mix asphalt concrete,making it a valuable asset for pavement engineering.
基金The National Natural Science Foundation of China(No.51478114,51778136)
文摘To investigate the travel time prediction method of the freeway, a model based on the gradient boosting decision tree (GBDT) is proposed. Eleven variables (namely, travel time in current period T i , traffic flow in current period Q i , speed in current period V i , density in current period K i , the number of vehicles in current period N i , occupancy in current period R i , traffic state parameter in current period X i , travel time in previous time period T i -1 , etc.) are selected to predict the travel time for 10 min ahead in the proposed model. Data obtained from VISSIM simulation is used to train and test the model. The results demonstrate that the prediction error of the GBDT model is smaller than those of the back propagation (BP) neural network model and the support vector machine (SVM) model. Travel time in current period T i is the most important variable among all variables in the GBDT model. The GBDT model can produce more accurate prediction results and mine the hidden nonlinear relationships deeply between variables and the predicted travel time.
基金Under the auspices of National Natural Science Foundation of China(No.42071226,41671176)Taishan Scholars Youth Expert Support Plan of Shandong Province(No.TSQN202306183)。
文摘Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges in precisely defining SICLU and constructing comprehensive indicators,which has hindered the exploration of factors influencing LSO within the SICLU framework.To address this gap,we integrated self-efficacy theory into the design of an index framework for evaluating SICLU.We subsequently employed econometric models to analyze the significant factors that impact LSO.Our findings reveal that SICLU can be divided into four key dimensions:intensive management,efficient output,resource conservation,and ecological environment optimization.Furthermore,it is crucial to incorporate belief-based cognitive factors into the index system,as farmers’ understanding of fertilizer and pesticide application significantly influences their willingness to engage in LSO.Moreover,we identify grain market turnover as the most influential factor in promoting LSO,with single-factor contribution rates reaching 70.9% for cultivated land transfer willingness and 62.5% for the total planting areas.Interestingly,unlike irrigation and agricultural machinery inputs,increased labor inputs correspond to larger planting areas for farmers.This trend may be attributed to reduced labor availability because of rural labor migration,whereas the reduction in irrigation and agricultural input is contingent on innovations in production practices and the transfer of cultivated land management rights.Importantly,SICLU dynamically influences LSO,with each index related to SICLU having an optimal range that fosters LSO.These insights offer valuable guidance for policymakers,emphasizing farmers as their central focus,with the adjustment of input and output factors as a means to achieve LSO as the ultimate goal.In conclusion,we propose research avenues for further enriching the SICLU framework to ensure that it aligns with the specific characteristics of regional agricultural development.
基金funded by Act 211 Government of the Russian Federation,Contract No.02.A03.21.0011.
文摘The uniaxial compressive strength(UCS)of rock is an essential property of rock material in different relevant applications,such as rock slope,tunnel construction,and foundation.It takes enormous time and effort to obtain the UCS values directly in the laboratory.Accordingly,an indirect determination of UCS through conducting several rock index tests that are easy and fast to carry out is of interest and importance.This study presents powerful boosting trees evaluation framework,i.e.,adaptive boosting machine,extreme gradient boosting machine(XGBoost),and category gradient boosting machine,for estimating the UCS of sandstone.Schmidt hammer rebound number,P-wave velocity,and point load index were chosen as considered factors to forecast UCS values of sandstone samples.Taylor diagrams and five regression metrics,including coefficient of determination(R2),root mean square error,mean absolute error,variance account for,and A-20 index,were used to evaluate and compare the performance of these boosting trees.The results showed that the proposed boosting trees are able to provide a high level of prediction capacity for the prepared database.In particular,itwas worth noting that XGBoost is the best model to predict sandstone strength and it achieved 0.999 training R^(2) and 0.958 testing R^(2).The proposed model had more outstanding capability than neural network with optimization techniques during training and testing phases.The performed variable importance analysis reveals that the point load index has a significant influence on predicting UCS of sandstone.
基金Projects(61573380,61303185) supported by the National Natural Science Foundation of ChinaProjects(2016M592450,2017M612585) supported by the China Postdoctoral Science FoundationProjects(2016JJ4119,2017JJ3416) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this work, we studied how to detect artificial pornographic pictures, especially when they are on social networks. The whole detection process can be divided into two stages: feature selection and picture detection. In the feature selection stage, seven types of features that favour picture detection were selected. In the picture detection stage, three steps were included. 1) In order to alleviate the imbalance in the number of artificial pornographic pictures and normal ones, the training dataset of artificial pornographic pictures was expanded. Therefore, the features which were extracted from the training dataset can also be expanded too. 2) In order to reduce the time of feature extraction, a fast method which extracted features based on the proportionally scaled picture rather than the original one was proposed. 3) Three tree models were compared and a gradient boost decision tree (GBDT) was selected for the final picture detection. Three sets of experimental results show that the proposed method can achieve better recognition precision and drastically reduce the time cost of the method.
文摘In order to improve the accuracy of target intent recognition,a recognition method based on XGBoost(eXtreme Gradient Boosting)decision tree is proposed.This paper adopts relevant data and program of python to calculate the probability of tactical intention.Then the sequence intention probability is obtained by applying Dempster-Shafer rule of combination.To verify the accuracy of recognition results,we compare the experimental results of this paper with the results in the literatures.The experiment shows that the probability of tactical intention recognition through this method is improved,so this method is feasible.
文摘Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even death. Curing epilepsy requires risky surgery. If not, the patient may be subjected to a long drug treatment associated with lifestyle advice without guarantee of total recovery. However, regardless of the type of treatment performed, late treatment necessarily creates psychological instability in the patient. It is therefore important to be able to diagnose the disease as early as possible if we desire that the patient does not suffer from its consequences on their mental health. That is why the study aims to propose a model for detecting epilepsy in order to be able to identify it as early as possible, especially in newborns. The objective of the article is to propose a model for detecting epilepsy using data from electroencephalogram signals from 10 newborns. This model developed using the extra trees classifier technique offers the possibility of predicting epilepsy in infants with an accuracy of around 99.4%.
基金the National Science Foundation of China(Grant No.42177164)the Distinguished Youth Science Foundation of Hunan Province of China(Grant No.2022JJ10073)the Innovation-Driven Project of Central South University(Grant No.2020CX040).
文摘The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requirements of high accuracy for stability assessment of entry-type excavations.Therefore,this study proposes a new prediction method based on machine learning to scientifically adjust the critical span graph.Accordingly,the particle swarm optimization(PSO)algorithm is used to optimize the core parameters of the gradient boosting decision tree(GBDT),abbreviated as PSO-GBDT.Moreover,the classification performance of eight other classifiers including GDBT,k-nearest neighbors(KNN),two kinds of support vector machines(SVM),Gaussian naive Bayes(GNB),logistic regression(LR)and linear discriminant analysis(LDA)are also applied to compare with the proposed model.Findings revealed that compared with the other eight models,the prediction performance of PSO-GBDT is undoubtedly the most reliable,and its classification accuracy is up to 0.93.Therefore,this model has great potential to provide a more scientific and accurate choice for the stability prediction of underground excavations.In addition,each classification model is used to predict the stability category of several grid points divided by the critical span graph,and the updated critical span graph of each model is discussed in combination with previous studies.The results show that the PSO-GBDT model has the advantages of being scientific,accurate and efficient in updating the critical span graph,and its output decision boundary has strict theoretical support,which can help mine operators make favorable economic decisions.
基金This work was supported in part by the National Nature Science Foundation of China(NSFC)(Nos.11927901 and 12175084)the National Key Research and Development Program of China(Nos.2020YFE0202002 and 2022YFA1604900)the Fundamental Research Funds for the Central Universities(No.CCNU22QN005).
文摘The zero-degree calorimeter(ZDC)plays a crucial role toward determining the centrality in the Cooling-Storage-Ring External-target Experiment(CEE)at the Heavy Ion Research Facility in Lanzhou.A boosted decision tree(BDT)multi-classification algorithm was employed to classify the centrality of the collision events based on the raw features from ZDC such as the number of fired channels and deposited energy.The data from simulated^(238)U+^(238)U collisions at 500 MeV∕u,generated by the IQMD event generator and subsequently modeled using the GEANT4 package,were employed to train and test the BDT model.The results showed the high accuracy of the multi-classification model adopted in ZDC for centrality determination,which is robust against variations in different factors of detector geometry and response.This study demon-strates the good performance of CEE-ZDC in determining the centrality in nucleus-nucleus collisions.
基金funded by the National Key Research and Development Program of China Strategic International Cooperation in Science and Technology Innovation Program (2018YFE0207800)the National Natural Science Foundation of China (31971483)。
文摘The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.
文摘The ongoing effort to create methods for detecting and quantifying fatigue damage is motivated by the high levels of uncertainty in present fatigue-life prediction approaches and the frequently catastrophic nature of fatigue failure.The fatigue life of high strength aluminum alloy 2090-T83 is predicted in this study using a variety of artificial intelligence and machine learning techniques for constant amplitude and negative stress ratios(R?1).Artificial neural networks(ANN),adaptive neuro-fuzzy inference systems(ANFIS),support-vector machines(SVM),a random forest model(RF),and an extreme-gradient tree-boosting model(XGB)are trained using numerical and experimental input data obtained from fatigue tests based on a relatively low number of stress measurements.In particular,the coefficients of the traditional force law formula are found using relevant numerical methods.It is shown that,in comparison to traditional approaches,the neural network and neuro-fuzzy models produce better results,with the neural network models trained using the boosting iterations technique providing the best performances.Building strong models from weak models,XGB helps to predict fatigue life by reducing model partiality and variation in supervised learning.Fuzzy neural models can be used to predict the fatigue life of alloys more accurately than neural networks and traditional methods.
基金supported by Key innovation team program of innovation talents promotion plan by MOST of China(No.2016RA4059)Natural Science Foundation Committee Program of China(No.51778474)Science and Technology Project of Yunnan Provincial Transportation Department(No.25 of 2018)。
文摘This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient and accurate rock trace identification.A thirteen-dimensional database consisting of basic,vector,and discontinuity features is established from image samples.All data points are classified as either‘‘trace”or‘‘non-trace”to divide the ultimate results into candidate trace samples.It is found that the SMOTE technology can effectively improve classification performance by recommending an optimized imbalance ratio of 1:5 to 1:4.Then,sixteen classifiers generated from four basic machine learning(ML)models are applied for performance comparison.The results reveal that the proposed RS-SMOTE-GBT classifier outperforms the other fifteen hybrid ML algorithms for both trace and nontrace classifications.Finally,discussions on feature importance,generalization ability and classification error are conducted for the proposed classifier.The experimental results indicate that more critical features affecting the trace classification are primarily from the discontinuity features.Besides,cleaning up the sedimentary pumice and reducing the area of fractured rock contribute to improving the overall classification performance.The proposed method provides a new alternative approach for the identification of 3D rock trace.
基金financially supported by the Fundamental Research Funds for the Central Universities(2020ZDPY0221)。
文摘In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet grouting(JG)technique(high-pressure grout mixed with coal particles)was first introduced in this study to improve the self-supporting ability of coal mass.To evaluate the strength of the jet-grouted coal-grout composite(JG composite),the UCS evolution patterns were analyzed by preparing 405 specimens combining the influential variables of grout types,curing time,and coal to grout(C/G)ratio.Furthermore,the relationships between UCS and these influencing variables were modeled using ensemble learning methods i.e.gradient boosted regression tree(GBRT)and random forest(RF)with their hyperparameters tuned by the particle swarm optimization(PSO).The results showed that the chemical grout composite has higher short-term strength,while the cement grout composite can achieve more stable strength in the long term.The PSO-GBRT and PSO-RF models can both achieve high prediction accuracy.Also,the variable importance analysis demonstrated that the grout type and curing time should be considered carefully.This study provides a robust intelligent model for predicting UCS of JG composites,which boosts JG design in the field.
基金The National Key R&D Program of China under contract No.2017YFE0104400the National Natural Science Foundation of China under contract No.31772852the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology(Qingdao)under contract No.2018SDKJ0501-2。
文摘Habitat suitability index(HSI)models have been widely used to analyze the relationship between species abundance and environmental factors,and ultimately inform management of marine species.The response of species abundance to each environmental variable is different and habitat requirements may change over life history stages and seasons.Therefore,it is necessary to determine the optimal combination of environmental variables in HSI modelling.In this study,generalized additive models(GAMs)were used to determine which environmental variables to be included in the HSI models.Significant variables were retained and weighted in the HSI model according to their relative contribution(%)to the total deviation explained by the boosted regression tree(BRT).The HSI models were applied to evaluate the habitat suitability of mantis shrimp Oratosquilla oratoria in the Haizhou Bay and adjacent areas in 2011 and 2013–2017.Ontogenetic and seasonal variations in HSI models of mantis shrimp were also examined.Among the four models(non-optimized model,BRT informed HSI model,GAM informed HSI model,and both BRT and GAM informed HSI model),both BRT and GAM informed HSI model showed the best performance.Four environmental variables(bottom temperature,depth,distance offshore and sediment type)were selected in the HSI models for four groups(spring-juvenile,spring-adult,falljuvenile and fall-adult)of mantis shrimp.The distribution of habitat suitability showed similar patterns between juveniles and adults,but obvious seasonal variations were observed.This study suggests that the process of optimizing environmental variables in HSI models improves the performance of HSI models,and this optimization strategy could be extended to other marine organisms to enhance the understanding of the habitat suitability of target species.
基金supported by the National Natural Science Foundation of China(Grant No.61863010)the Key Research and Development Program of Shandong Province of China(Grant No.2019GGX101001)the Natural Science Foundation of Shandong Province of China(Grant No.ZR2018MC007)。
文摘Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.