Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for st...Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hie...This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hierarchical clustering method to quickly obtain the best features of employees to reach their best performance.The work of this paper consists of two parts.The first part is based on collecting data of employees to calculate and illustrate the performance of each employee.The second part is based on the classification and prediction techniques of the employee performance.This model is designed to help companies in their decisions about the employees’performance.The classification and prediction algorithms use the Gradient Boosting Tree classifier to classify and predict the features.Results of the paper give the percentage of employees which are expected to leave the company after predicting their performance for the coming years.Results also show that the Grasshopper Optimization,followed by“KF”with the Gradient Boosting Tree as classifier and predictor,is characterized by a high accuracy.The proposed algorithm is compared with other known techniques where our results are fund to be superior.展开更多
BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced N...BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced NPC with the addition of chemotherapy to concomitant chemoradiotherapy.Therefore,precise prediction of metastasis in patients with NPC is crucial.AIM To develop a predictive model for metastasis in NPC using detailed magnetic resonance imaging(MRI)reports.METHODS This retrospective study included 792 patients with non-distant metastatic NPC.A total of 469 imaging variables were obtained from detailed MRI reports.Data were stratified and randomly split into training(50%)and testing sets.Gradient boosting tree(GBT)models were built and used to select variables for predicting DM.A full model comprising all variables and a reduced model with the top-five variables were built.Model performance was assessed by area under the curve(AUC).RESULTS Among the 792 patients,94 developed DM during follow-up.The number of metastatic cervical nodes(30.9%),tumor invasion in the posterior half of the nasal cavity(9.7%),two sides of the pharyngeal recess(6.2%),tubal torus(3.3%),and single side of the parapharyngeal space(2.7%)were the top-five contributors for predicting DM,based on their relative importance in GBT models.The testing AUC of the full model was 0.75(95%confidence interval[CI]:0.69-0.82).The testing AUC of the reduced model was 0.75(95%CI:0.68-0.82).For the whole dataset,the full(AUC=0.76,95%CI:0.72-0.82)and reduced models(AUC=0.76,95%CI:0.71-0.81)outperformed the tumor node-staging system(AUC=0.67,95%CI:0.61-0.73).CONCLUSION The GBT model outperformed the tumor node-staging system in predicting metastasis in NPC.The number of metastatic cervical nodes was identified as the principal contributing variable.展开更多
The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requ...The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requirements of high accuracy for stability assessment of entry-type excavations.Therefore,this study proposes a new prediction method based on machine learning to scientifically adjust the critical span graph.Accordingly,the particle swarm optimization(PSO)algorithm is used to optimize the core parameters of the gradient boosting decision tree(GBDT),abbreviated as PSO-GBDT.Moreover,the classification performance of eight other classifiers including GDBT,k-nearest neighbors(KNN),two kinds of support vector machines(SVM),Gaussian naive Bayes(GNB),logistic regression(LR)and linear discriminant analysis(LDA)are also applied to compare with the proposed model.Findings revealed that compared with the other eight models,the prediction performance of PSO-GBDT is undoubtedly the most reliable,and its classification accuracy is up to 0.93.Therefore,this model has great potential to provide a more scientific and accurate choice for the stability prediction of underground excavations.In addition,each classification model is used to predict the stability category of several grid points divided by the critical span graph,and the updated critical span graph of each model is discussed in combination with previous studies.The results show that the PSO-GBDT model has the advantages of being scientific,accurate and efficient in updating the critical span graph,and its output decision boundary has strict theoretical support,which can help mine operators make favorable economic decisions.展开更多
Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even de...Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even death. Curing epilepsy requires risky surgery. If not, the patient may be subjected to a long drug treatment associated with lifestyle advice without guarantee of total recovery. However, regardless of the type of treatment performed, late treatment necessarily creates psychological instability in the patient. It is therefore important to be able to diagnose the disease as early as possible if we desire that the patient does not suffer from its consequences on their mental health. That is why the study aims to propose a model for detecting epilepsy in order to be able to identify it as early as possible, especially in newborns. The objective of the article is to propose a model for detecting epilepsy using data from electroencephalogram signals from 10 newborns. This model developed using the extra trees classifier technique offers the possibility of predicting epilepsy in infants with an accuracy of around 99.4%.展开更多
This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient a...This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient and accurate rock trace identification.A thirteen-dimensional database consisting of basic,vector,and discontinuity features is established from image samples.All data points are classified as either‘‘trace”or‘‘non-trace”to divide the ultimate results into candidate trace samples.It is found that the SMOTE technology can effectively improve classification performance by recommending an optimized imbalance ratio of 1:5 to 1:4.Then,sixteen classifiers generated from four basic machine learning(ML)models are applied for performance comparison.The results reveal that the proposed RS-SMOTE-GBT classifier outperforms the other fifteen hybrid ML algorithms for both trace and nontrace classifications.Finally,discussions on feature importance,generalization ability and classification error are conducted for the proposed classifier.The experimental results indicate that more critical features affecting the trace classification are primarily from the discontinuity features.Besides,cleaning up the sedimentary pumice and reducing the area of fractured rock contribute to improving the overall classification performance.The proposed method provides a new alternative approach for the identification of 3D rock trace.展开更多
In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet g...In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet grouting(JG)technique(high-pressure grout mixed with coal particles)was first introduced in this study to improve the self-supporting ability of coal mass.To evaluate the strength of the jet-grouted coal-grout composite(JG composite),the UCS evolution patterns were analyzed by preparing 405 specimens combining the influential variables of grout types,curing time,and coal to grout(C/G)ratio.Furthermore,the relationships between UCS and these influencing variables were modeled using ensemble learning methods i.e.gradient boosted regression tree(GBRT)and random forest(RF)with their hyperparameters tuned by the particle swarm optimization(PSO).The results showed that the chemical grout composite has higher short-term strength,while the cement grout composite can achieve more stable strength in the long term.The PSO-GBRT and PSO-RF models can both achieve high prediction accuracy.Also,the variable importance analysis demonstrated that the grout type and curing time should be considered carefully.This study provides a robust intelligent model for predicting UCS of JG composites,which boosts JG design in the field.展开更多
Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technol...Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.展开更多
Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,s...Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.展开更多
y consumption efficiency and to increase the crop yield.With the increase of agri-cultural data generated by the Internet of Things(IoT),more feasible models are necessary to get full usage of such information.In this...y consumption efficiency and to increase the crop yield.With the increase of agri-cultural data generated by the Internet of Things(IoT),more feasible models are necessary to get full usage of such information.In this research,a Gradient Boost Decision Tree(GBDT)model based on the newly-developed Light Gradient Boosting Machine algorithm(LightGBM or LGBM)was proposed to model the internal temperature of a greenhouse.Fea-tures including climate variables,control variables and additional temporal information collected within five years were used to construct a suitable dataset to train and validate the LGBM model.An adaptive cross-validation method was developed as a novelty to improve the LGBM model performance and self-adaptive ability.For comparison of the pre-dictive accuracy,a Back-Propagation(BP)Neural Network model and a Recurrent Neural Network(RNN)model were built under the same process.Another two GBDT algorithms,Extreme Gradient Boosting(Xgboost)and Stochastic Gradient Boosting(SGB),were also introduced to compare the predictive accuracy with LGBM model.Results suggest that the LGBM has best fitting ability for the temperature curves with RMSE value at 0.645℃,as well as the fastest training speed among all algorithms with 60 times faster than the other two neural network algorithms.The LGBM has strongly potential application pro-spect on both greenhouse environment prediction and real-time predictive control.展开更多
The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate ...The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate and precise to plan our activities and safeguard ourselves as well as our property from disasters.Rainfall,wind speed,humidity,wind direction,cloud,temperature,and other weather forecasting variables are used in this work for weather prediction.Many research works have been conducted on weather forecasting.The drawbacks of existing approaches are that they are less effective,inaccurate,and time-consuming.To overcome these issues,this paper proposes an enhanced and reliable weather forecasting technique.As well as developing weather forecasting in remote areas.Weather data analysis and machine learning techniques,such as Gradient Boosting Decision Tree,Random Forest,Naive Bayes Bernoulli,and KNN Algorithm are deployed to anticipate weather conditions.A comparative analysis of result outcome said in determining the number of ensemble methods that may be utilized to improve the accuracy of prediction in weather forecasting.The aim of this study is to demonstrate its ability to predict weather forecasts as soon as possible.Experimental evaluation shows our ensemble technique achieves 95%prediction accuracy.Also,for 1000 nodes it is less than 10 s for prediction,and for 5000 nodes it takes less than 40 s for prediction.展开更多
Aiming at the personalized movie recommendation problem,a recommendation algorithm in-tegrating manifold learning and ensemble learning is studied.In this work,manifold learning is used to reduce the dimension of data...Aiming at the personalized movie recommendation problem,a recommendation algorithm in-tegrating manifold learning and ensemble learning is studied.In this work,manifold learning is used to reduce the dimension of data so that both time and space complexities of the model are mitigated.Meanwhile,gradient boosting decision tree(GBDT)is used to train the target user profile prediction model.Based on the recommendation results,Bayesian optimization algorithm is applied to optimize the recommendation model,which can effectively improve the prediction accuracy.The experimental results show that the proposed algorithm can improve the accuracy of movie recommendation.展开更多
Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish...Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish their course).The approach is presented on course“Methods and algorithms of the graph theory”held on national platform of online education in Russia.This paper includes all the steps to build an intelligent system to predict students who are active during the course,but not likely to finish it.The first part consists of constructing the right sample for prediction,EDA and choosing the most appropriate week of the course to make predictions on.The second part is about choosing the right metric and building models.Also,approach with using ensembles like stacking is proposed to increase the accuracy of predictions.As a result,a general approach to build a churn prediction model for online course is reviewed.This approach can be used for making the process of online education adaptive and intelligent for a separate student.展开更多
红绿灯位置是道路上行人和车辆的交会点,极大影响着道路结构和交通运行,在城市路网中起着重要的枢纽作用。针对目前红绿灯位置检测方法准确率不够高、覆盖面区域不完整等问题,提出了一种基于轨迹数据的交通灯位置检测方法。该方法基于聚...红绿灯位置是道路上行人和车辆的交会点,极大影响着道路结构和交通运行,在城市路网中起着重要的枢纽作用。针对目前红绿灯位置检测方法准确率不够高、覆盖面区域不完整等问题,提出了一种基于轨迹数据的交通灯位置检测方法。该方法基于聚类-合并-分类-合并的四级模型,首先从清理过的轨迹数据中提取隐含的车辆行驶特征,再采用具有噪声的基于密度的聚类(density-based spatial clustering of applications with noise,DBSCAN)方法得到转向和停驻两类聚类中心,对这两类聚类中心进行合并,获得红绿灯位置的候选位置;根据候选位置一定范围内的轨迹点提取该区域的车流行驶特征,然后采用梯度提升决策树(gradient boosting decision tree,GBDT)算法进行分类,最后将候选位置的正样本融合,以检测红绿灯位置。采用成都市浮动车GPS轨迹数据进行实验,检测结果的F1分数为0.947,效果优于常规的机器学习方法。实验结果表明,基于GPS轨迹数据,采用提出的四层模型能有效检测出红绿灯的位置,该模型可被用于城市大范围红绿灯位置信息的快速获取和更新。展开更多
Sepsis poses a serious threat to health of children in pediatric intensive care unit.The mortality from pediatric sepsis can be effectively reduced through in-time diagnosis and therapeutic intervention.The bacillicul...Sepsis poses a serious threat to health of children in pediatric intensive care unit.The mortality from pediatric sepsis can be effectively reduced through in-time diagnosis and therapeutic intervention.The bacilliculture detection method is too time-consuming to receive timely treatment.In this research,we propose a new framework:a deep encoding network with cross features(CF-DEN)that enables accurate early detection of sepsis.Cross features are automatically constructed via the gradient boosting decision tree and distilled into the deep encoding network(DEN)we designed.The DEN is aimed at learning sufficiently effective representation from clinical test data.Each layer of the DEN fltrates the features involved in computation at current layer via attention mechanism and outputs the current prediction which is additive layer by layer to obtain the embedding feature at last layer.The framework takes the advantage of tree-based method and neural network method to extract effective representation from small clinical dataset and obtain accurate prediction in order to prompt patient to get timely treatment.We evaluate the performance of the framework on the dataset collected from Shanghai Children's Medical Center.Compared with common machine learning methods,our method achieves the increase on F1-score by 16.06%on the test set.展开更多
Pedestrian well-being reflects emotional experience during walking.Analyzing which built environment factors influence pedestrian wellbeing not only helps to improve residents’physical and mental health but also enco...Pedestrian well-being reflects emotional experience during walking.Analyzing which built environment factors influence pedestrian wellbeing not only helps to improve residents’physical and mental health but also encourages more walking.Based on the data obtained via a questionnaire survey in Harbin,China,a gradient boosting decision tree(GBDT)model is developed to analyze how the perception of the built environment influences pedestrian well-being and to explain the differences across types of neighborhoods(old,new,and mixed).The results show that pedestrian well-being is most influenced by the diversity of daily service facilities,followed by the number of commercial facilities along a street,the accessibility of daily service facilities,and green spaces.Moreover,pedestrian well-being is also influenced by the type of neighborhoods.In new neighborhoods,it is dominated by the accessibility of public transport stations,while in old and mixed neighborhoods,pedestrian well-being is primarily determined by the accessibility of green spaces and the number of green spaces,respectively.Depending on the characteristics of the built environment,different intervention measures are proposed to improve pedestrian well-being and promote walking.展开更多
When travelling,people are accustomed to taking and uploading photos on social media websites,which has led to the accumulation of huge numbers of geotagged photos.Combined with multisource information(e.g.weather,tra...When travelling,people are accustomed to taking and uploading photos on social media websites,which has led to the accumulation of huge numbers of geotagged photos.Combined with multisource information(e.g.weather,transportation,or textual information),these geotagged photos could help us in constructing user preference profiles at a high level of detail.Therefore,using these geotagged photos,we built a personalised recommendation system to provide attraction recommendations that match a user’s preferences.Specifically,we retrieved a geotagged photo collection from the public API for Flickr(Flickr.com)and fetched a large amount of other contextual information to rebuild a user’s travel history.We then created a model-based recommendation method with a two-stage architecture that consists of candidate generation(the matching process)and candidate ranking.In the matching process,we used a support vector machine model that was modified for multiclass classification to generate the candidate list.In addition,we used a gradient boosting regression tree to score each candidate and rerank the list.Finally,we evaluated our recommendation results with respect to accuracy and ranking ability.Compared with widely used memory-based methods,our proposed method performs significantly better in the cold-start situation and when mining‘long-tail’data.展开更多
Integrated management of municipal solid waste(MSW)is a major environmental challenge encountered by many countries.To support waste treatment/management and national macroeconomic policy development,it is essential t...Integrated management of municipal solid waste(MSW)is a major environmental challenge encountered by many countries.To support waste treatment/management and national macroeconomic policy development,it is essential to develop a prediction model.With this motivation,a database of MSW generation and feature variables covering 130 cities across China is constructed.Based on the database,advanced machine learning(gradient boost regression tree)algorithm is adopted to build the waste generation prediction model,i.e.,WGMod.In the model development process,the main influencing factors on MSW generation are identified by weight analysis.The selected key influencing factors are annual precipitation,population density and annual mean temperature with the weights of 13%,11%and 10%,respectively.The WGMod shows good performance with R^(2)=0.939.Model prediction on MSW generation in Beijing and Shenzhen indicates that waste generation in Beijing would increase gradually in the next 3–5 years,while that in Shenzhen would grow rapidly in the next 3 years.The difference between the two is predominately driven by the different trends of population growth.展开更多
As a typical screening apparatus,the elliptically vibrating screen was extensively employed for the size classification of granular materials.Unremitting efforts have been paid on the improvement of sieving performanc...As a typical screening apparatus,the elliptically vibrating screen was extensively employed for the size classification of granular materials.Unremitting efforts have been paid on the improvement of sieving performance,but the optimization problem was still perplexing the researchers due to the complexity of sieving process.In the present paper,the sieving process of elliptically vibrating screen was numerically simulated based on the Discrete Element Method(DEM).The production quality and the processing capacity of vibrating screen were measured by the screening efficiency and the screening time,respectively.The sieving parameters including the length of semi-major axis,the length ratio of two semi-axes,the vibration frequency,the inclination angle,the vibration direction angle and the motion direction of screen deck were investigated.Firstly,the Gradient Boosting Decision Trees(GBDT)algorithm was adopted in the modelling task of screening data.The trained prediction models with sufficient generalization performance were obtained,and the relative importance of six parameters for both the screening indexes was revealed.After that,a hybrid MACO-GBDT algorithm based on the Ant Colony Optimization(ACO)was proposed for optimizing the sieving performance of vibrating screen.Both the single objective optimization of screening efficiency and the stepwise optimization of screening results were conducted.Ultimately,the reliability of the MACO-GBDT algorithm were examined by the numerical experiments.The optimization strategy provided in this work would be helpful for the parameter design and the performance improvement of vibrating screens.展开更多
基金supported by the National Nat-ural Science Foundation of China(No.52203376)the National Key Research and Development Program of China(No.2023YFB3813200).
文摘Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘This paper aims to design an optimizer followed by a Kawahara filter for optimal classification and prediction of employees’performance.The algorithm starts by processing data by a modified K-means technique as a hierarchical clustering method to quickly obtain the best features of employees to reach their best performance.The work of this paper consists of two parts.The first part is based on collecting data of employees to calculate and illustrate the performance of each employee.The second part is based on the classification and prediction techniques of the employee performance.This model is designed to help companies in their decisions about the employees’performance.The classification and prediction algorithms use the Gradient Boosting Tree classifier to classify and predict the features.Results of the paper give the percentage of employees which are expected to leave the company after predicting their performance for the coming years.Results also show that the Grasshopper Optimization,followed by“KF”with the Gradient Boosting Tree as classifier and predictor,is characterized by a high accuracy.The proposed algorithm is compared with other known techniques where our results are fund to be superior.
文摘BACKGROUND Development of distant metastasis(DM)is a major concern during treatment of nasopharyngeal carcinoma(NPC).However,studies have demonstrated im-proved distant control and survival in patients with advanced NPC with the addition of chemotherapy to concomitant chemoradiotherapy.Therefore,precise prediction of metastasis in patients with NPC is crucial.AIM To develop a predictive model for metastasis in NPC using detailed magnetic resonance imaging(MRI)reports.METHODS This retrospective study included 792 patients with non-distant metastatic NPC.A total of 469 imaging variables were obtained from detailed MRI reports.Data were stratified and randomly split into training(50%)and testing sets.Gradient boosting tree(GBT)models were built and used to select variables for predicting DM.A full model comprising all variables and a reduced model with the top-five variables were built.Model performance was assessed by area under the curve(AUC).RESULTS Among the 792 patients,94 developed DM during follow-up.The number of metastatic cervical nodes(30.9%),tumor invasion in the posterior half of the nasal cavity(9.7%),two sides of the pharyngeal recess(6.2%),tubal torus(3.3%),and single side of the parapharyngeal space(2.7%)were the top-five contributors for predicting DM,based on their relative importance in GBT models.The testing AUC of the full model was 0.75(95%confidence interval[CI]:0.69-0.82).The testing AUC of the reduced model was 0.75(95%CI:0.68-0.82).For the whole dataset,the full(AUC=0.76,95%CI:0.72-0.82)and reduced models(AUC=0.76,95%CI:0.71-0.81)outperformed the tumor node-staging system(AUC=0.67,95%CI:0.61-0.73).CONCLUSION The GBT model outperformed the tumor node-staging system in predicting metastasis in NPC.The number of metastatic cervical nodes was identified as the principal contributing variable.
基金the National Science Foundation of China(Grant No.42177164)the Distinguished Youth Science Foundation of Hunan Province of China(Grant No.2022JJ10073)the Innovation-Driven Project of Central South University(Grant No.2020CX040).
文摘The stability of underground entry-type excavations will directly affect the working environment and the safety of staff.Empirical critical span graphs and traditional statistics learning methods can not meet the requirements of high accuracy for stability assessment of entry-type excavations.Therefore,this study proposes a new prediction method based on machine learning to scientifically adjust the critical span graph.Accordingly,the particle swarm optimization(PSO)algorithm is used to optimize the core parameters of the gradient boosting decision tree(GBDT),abbreviated as PSO-GBDT.Moreover,the classification performance of eight other classifiers including GDBT,k-nearest neighbors(KNN),two kinds of support vector machines(SVM),Gaussian naive Bayes(GNB),logistic regression(LR)and linear discriminant analysis(LDA)are also applied to compare with the proposed model.Findings revealed that compared with the other eight models,the prediction performance of PSO-GBDT is undoubtedly the most reliable,and its classification accuracy is up to 0.93.Therefore,this model has great potential to provide a more scientific and accurate choice for the stability prediction of underground excavations.In addition,each classification model is used to predict the stability category of several grid points divided by the critical span graph,and the updated critical span graph of each model is discussed in combination with previous studies.The results show that the PSO-GBDT model has the advantages of being scientific,accurate and efficient in updating the critical span graph,and its output decision boundary has strict theoretical support,which can help mine operators make favorable economic decisions.
文摘Epilepsy is a very common worldwide neurological disorder that can affect a person’s quality of life at any age. People with epilepsy typically have recurrent seizures that can lead to injury or in some cases even death. Curing epilepsy requires risky surgery. If not, the patient may be subjected to a long drug treatment associated with lifestyle advice without guarantee of total recovery. However, regardless of the type of treatment performed, late treatment necessarily creates psychological instability in the patient. It is therefore important to be able to diagnose the disease as early as possible if we desire that the patient does not suffer from its consequences on their mental health. That is why the study aims to propose a model for detecting epilepsy in order to be able to identify it as early as possible, especially in newborns. The objective of the article is to propose a model for detecting epilepsy using data from electroencephalogram signals from 10 newborns. This model developed using the extra trees classifier technique offers the possibility of predicting epilepsy in infants with an accuracy of around 99.4%.
基金supported by Key innovation team program of innovation talents promotion plan by MOST of China(No.2016RA4059)Natural Science Foundation Committee Program of China(No.51778474)Science and Technology Project of Yunnan Provincial Transportation Department(No.25 of 2018)。
文摘This paper presents a hybrid ensemble classifier combined synthetic minority oversampling technique(SMOTE),random search(RS)hyper-parameters optimization algorithm and gradient boosting tree(GBT)to achieve efficient and accurate rock trace identification.A thirteen-dimensional database consisting of basic,vector,and discontinuity features is established from image samples.All data points are classified as either‘‘trace”or‘‘non-trace”to divide the ultimate results into candidate trace samples.It is found that the SMOTE technology can effectively improve classification performance by recommending an optimized imbalance ratio of 1:5 to 1:4.Then,sixteen classifiers generated from four basic machine learning(ML)models are applied for performance comparison.The results reveal that the proposed RS-SMOTE-GBT classifier outperforms the other fifteen hybrid ML algorithms for both trace and nontrace classifications.Finally,discussions on feature importance,generalization ability and classification error are conducted for the proposed classifier.The experimental results indicate that more critical features affecting the trace classification are primarily from the discontinuity features.Besides,cleaning up the sedimentary pumice and reducing the area of fractured rock contribute to improving the overall classification performance.The proposed method provides a new alternative approach for the identification of 3D rock trace.
基金financially supported by the Fundamental Research Funds for the Central Universities(2020ZDPY0221)。
文摘In the loose and fractured coal seam with particularly low uniaxial compressive strength(UCS),driving a roadway is extremely difficult as roof falling and wall spalling occur frequently.To address this issue,the jet grouting(JG)technique(high-pressure grout mixed with coal particles)was first introduced in this study to improve the self-supporting ability of coal mass.To evaluate the strength of the jet-grouted coal-grout composite(JG composite),the UCS evolution patterns were analyzed by preparing 405 specimens combining the influential variables of grout types,curing time,and coal to grout(C/G)ratio.Furthermore,the relationships between UCS and these influencing variables were modeled using ensemble learning methods i.e.gradient boosted regression tree(GBRT)and random forest(RF)with their hyperparameters tuned by the particle swarm optimization(PSO).The results showed that the chemical grout composite has higher short-term strength,while the cement grout composite can achieve more stable strength in the long term.The PSO-GBRT and PSO-RF models can both achieve high prediction accuracy.Also,the variable importance analysis demonstrated that the grout type and curing time should be considered carefully.This study provides a robust intelligent model for predicting UCS of JG composites,which boosts JG design in the field.
基金supported by the National Natural Science Foundation of China(Grant No.61863010)the Key Research and Development Program of Shandong Province of China(Grant No.2019GGX101001)the Natural Science Foundation of Shandong Province of China(Grant No.ZR2018MC007)。
文摘Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
文摘Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.
基金This work was supported in part by Shanghai Agriculture Applied Technology Development Program,China(Grant No.G 2020-02-08-00-07-F01480)Shanghai Municipal Science and Technology Commission Innovation Action Plan(Grant No.17391900900)National Natural Science Foundation of China(Grant No.61573258).
文摘y consumption efficiency and to increase the crop yield.With the increase of agri-cultural data generated by the Internet of Things(IoT),more feasible models are necessary to get full usage of such information.In this research,a Gradient Boost Decision Tree(GBDT)model based on the newly-developed Light Gradient Boosting Machine algorithm(LightGBM or LGBM)was proposed to model the internal temperature of a greenhouse.Fea-tures including climate variables,control variables and additional temporal information collected within five years were used to construct a suitable dataset to train and validate the LGBM model.An adaptive cross-validation method was developed as a novelty to improve the LGBM model performance and self-adaptive ability.For comparison of the pre-dictive accuracy,a Back-Propagation(BP)Neural Network model and a Recurrent Neural Network(RNN)model were built under the same process.Another two GBDT algorithms,Extreme Gradient Boosting(Xgboost)and Stochastic Gradient Boosting(SGB),were also introduced to compare the predictive accuracy with LGBM model.Results suggest that the LGBM has best fitting ability for the temperature curves with RMSE value at 0.645℃,as well as the fastest training speed among all algorithms with 60 times faster than the other two neural network algorithms.The LGBM has strongly potential application pro-spect on both greenhouse environment prediction and real-time predictive control.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/42/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R135),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate and precise to plan our activities and safeguard ourselves as well as our property from disasters.Rainfall,wind speed,humidity,wind direction,cloud,temperature,and other weather forecasting variables are used in this work for weather prediction.Many research works have been conducted on weather forecasting.The drawbacks of existing approaches are that they are less effective,inaccurate,and time-consuming.To overcome these issues,this paper proposes an enhanced and reliable weather forecasting technique.As well as developing weather forecasting in remote areas.Weather data analysis and machine learning techniques,such as Gradient Boosting Decision Tree,Random Forest,Naive Bayes Bernoulli,and KNN Algorithm are deployed to anticipate weather conditions.A comparative analysis of result outcome said in determining the number of ensemble methods that may be utilized to improve the accuracy of prediction in weather forecasting.The aim of this study is to demonstrate its ability to predict weather forecasts as soon as possible.Experimental evaluation shows our ensemble technique achieves 95%prediction accuracy.Also,for 1000 nodes it is less than 10 s for prediction,and for 5000 nodes it takes less than 40 s for prediction.
基金Supported by the Educational Commission of Liaoning Province of China(No.LQGD2017027).
文摘Aiming at the personalized movie recommendation problem,a recommendation algorithm in-tegrating manifold learning and ensemble learning is studied.In this work,manifold learning is used to reduce the dimension of data so that both time and space complexities of the model are mitigated.Meanwhile,gradient boosting decision tree(GBDT)is used to train the target user profile prediction model.Based on the recommendation results,Bayesian optimization algorithm is applied to optimize the recommendation model,which can effectively improve the prediction accuracy.The experimental results show that the proposed algorithm can improve the accuracy of movie recommendation.
文摘Churn prediction is a common task for machine learning applications in business.In this paper,this task is adapted for solving problem of low efficiency of massive open online courses(only 5%of all the students finish their course).The approach is presented on course“Methods and algorithms of the graph theory”held on national platform of online education in Russia.This paper includes all the steps to build an intelligent system to predict students who are active during the course,but not likely to finish it.The first part consists of constructing the right sample for prediction,EDA and choosing the most appropriate week of the course to make predictions on.The second part is about choosing the right metric and building models.Also,approach with using ensembles like stacking is proposed to increase the accuracy of predictions.As a result,a general approach to build a churn prediction model for online course is reviewed.This approach can be used for making the process of online education adaptive and intelligent for a separate student.
文摘红绿灯位置是道路上行人和车辆的交会点,极大影响着道路结构和交通运行,在城市路网中起着重要的枢纽作用。针对目前红绿灯位置检测方法准确率不够高、覆盖面区域不完整等问题,提出了一种基于轨迹数据的交通灯位置检测方法。该方法基于聚类-合并-分类-合并的四级模型,首先从清理过的轨迹数据中提取隐含的车辆行驶特征,再采用具有噪声的基于密度的聚类(density-based spatial clustering of applications with noise,DBSCAN)方法得到转向和停驻两类聚类中心,对这两类聚类中心进行合并,获得红绿灯位置的候选位置;根据候选位置一定范围内的轨迹点提取该区域的车流行驶特征,然后采用梯度提升决策树(gradient boosting decision tree,GBDT)算法进行分类,最后将候选位置的正样本融合,以检测红绿灯位置。采用成都市浮动车GPS轨迹数据进行实验,检测结果的F1分数为0.947,效果优于常规的机器学习方法。实验结果表明,基于GPS轨迹数据,采用提出的四层模型能有效检测出红绿灯的位置,该模型可被用于城市大范围红绿灯位置信息的快速获取和更新。
文摘Sepsis poses a serious threat to health of children in pediatric intensive care unit.The mortality from pediatric sepsis can be effectively reduced through in-time diagnosis and therapeutic intervention.The bacilliculture detection method is too time-consuming to receive timely treatment.In this research,we propose a new framework:a deep encoding network with cross features(CF-DEN)that enables accurate early detection of sepsis.Cross features are automatically constructed via the gradient boosting decision tree and distilled into the deep encoding network(DEN)we designed.The DEN is aimed at learning sufficiently effective representation from clinical test data.Each layer of the DEN fltrates the features involved in computation at current layer via attention mechanism and outputs the current prediction which is additive layer by layer to obtain the embedding feature at last layer.The framework takes the advantage of tree-based method and neural network method to extract effective representation from small clinical dataset and obtain accurate prediction in order to prompt patient to get timely treatment.We evaluate the performance of the framework on the dataset collected from Shanghai Children's Medical Center.Compared with common machine learning methods,our method achieves the increase on F1-score by 16.06%on the test set.
基金the National Natural Science Foundation of China(Grant Nos.51878204,52278057).
文摘Pedestrian well-being reflects emotional experience during walking.Analyzing which built environment factors influence pedestrian wellbeing not only helps to improve residents’physical and mental health but also encourages more walking.Based on the data obtained via a questionnaire survey in Harbin,China,a gradient boosting decision tree(GBDT)model is developed to analyze how the perception of the built environment influences pedestrian well-being and to explain the differences across types of neighborhoods(old,new,and mixed).The results show that pedestrian well-being is most influenced by the diversity of daily service facilities,followed by the number of commercial facilities along a street,the accessibility of daily service facilities,and green spaces.Moreover,pedestrian well-being is also influenced by the type of neighborhoods.In new neighborhoods,it is dominated by the accessibility of public transport stations,while in old and mixed neighborhoods,pedestrian well-being is primarily determined by the accessibility of green spaces and the number of green spaces,respectively.Depending on the characteristics of the built environment,different intervention measures are proposed to improve pedestrian well-being and promote walking.
基金supported by grants from the National Key Research and Development Program of China[grant number 2017YFB0503602]the National Natural Science Foundation of China[grant number 41771425],[grant number 41625003],[grant number 41501162]the Beijing Philosophy and Social Science Foundation[grant number 17JDGLB002].
文摘When travelling,people are accustomed to taking and uploading photos on social media websites,which has led to the accumulation of huge numbers of geotagged photos.Combined with multisource information(e.g.weather,transportation,or textual information),these geotagged photos could help us in constructing user preference profiles at a high level of detail.Therefore,using these geotagged photos,we built a personalised recommendation system to provide attraction recommendations that match a user’s preferences.Specifically,we retrieved a geotagged photo collection from the public API for Flickr(Flickr.com)and fetched a large amount of other contextual information to rebuild a user’s travel history.We then created a model-based recommendation method with a two-stage architecture that consists of candidate generation(the matching process)and candidate ranking.In the matching process,we used a support vector machine model that was modified for multiclass classification to generate the candidate list.In addition,we used a gradient boosting regression tree to score each candidate and rerank the list.Finally,we evaluated our recommendation results with respect to accuracy and ranking ability.Compared with widely used memory-based methods,our proposed method performs significantly better in the cold-start situation and when mining‘long-tail’data.
基金supported by the National Key R&D Program of China(Nos.2018YFD1100600,2018YFC1902900).
文摘Integrated management of municipal solid waste(MSW)is a major environmental challenge encountered by many countries.To support waste treatment/management and national macroeconomic policy development,it is essential to develop a prediction model.With this motivation,a database of MSW generation and feature variables covering 130 cities across China is constructed.Based on the database,advanced machine learning(gradient boost regression tree)algorithm is adopted to build the waste generation prediction model,i.e.,WGMod.In the model development process,the main influencing factors on MSW generation are identified by weight analysis.The selected key influencing factors are annual precipitation,population density and annual mean temperature with the weights of 13%,11%and 10%,respectively.The WGMod shows good performance with R^(2)=0.939.Model prediction on MSW generation in Beijing and Shenzhen indicates that waste generation in Beijing would increase gradually in the next 3–5 years,while that in Shenzhen would grow rapidly in the next 3 years.The difference between the two is predominately driven by the different trends of population growth.
基金The research work is financially supported by National Natural Science Foundation of China(No.51775113)Natural Science Foundation of Fujian Province(No.2017J01675)+2 种基金51st Scientific Research Fund Program of Fujian University of Technology(No.GY-Z160139)Key Research Platform of NC Equipment and Technology in Fujian Province(No.2014H2002)Subsidized Project for Postgraduates’Innovative Fund in Scientific Research of Huaqiao University(No.17013080007).
文摘As a typical screening apparatus,the elliptically vibrating screen was extensively employed for the size classification of granular materials.Unremitting efforts have been paid on the improvement of sieving performance,but the optimization problem was still perplexing the researchers due to the complexity of sieving process.In the present paper,the sieving process of elliptically vibrating screen was numerically simulated based on the Discrete Element Method(DEM).The production quality and the processing capacity of vibrating screen were measured by the screening efficiency and the screening time,respectively.The sieving parameters including the length of semi-major axis,the length ratio of two semi-axes,the vibration frequency,the inclination angle,the vibration direction angle and the motion direction of screen deck were investigated.Firstly,the Gradient Boosting Decision Trees(GBDT)algorithm was adopted in the modelling task of screening data.The trained prediction models with sufficient generalization performance were obtained,and the relative importance of six parameters for both the screening indexes was revealed.After that,a hybrid MACO-GBDT algorithm based on the Ant Colony Optimization(ACO)was proposed for optimizing the sieving performance of vibrating screen.Both the single objective optimization of screening efficiency and the stepwise optimization of screening results were conducted.Ultimately,the reliability of the MACO-GBDT algorithm were examined by the numerical experiments.The optimization strategy provided in this work would be helpful for the parameter design and the performance improvement of vibrating screens.