Binding kinetic properties of protein–ligand complexes are crucial factors affecting the drug potency.Nevertheless,the current in silico techniques are insufficient in providing accurate and robust predictions for bi...Binding kinetic properties of protein–ligand complexes are crucial factors affecting the drug potency.Nevertheless,the current in silico techniques are insufficient in providing accurate and robust predictions for binding kinetic properties.To this end,this work develops a variety of binding kinetic models for predicting a critical binding kinetic property,dissociation rate constant,using eight machine learning(ML)methods(Bayesian Neural Network(BNN),partial least squares regression,Bayesian ridge,Gaussian process regression,principal component regression,random forest,support vector machine,extreme gradient boosting)and the descriptors of the van der Waals/electrostatic interaction energies.These eight models are applied to two case studies involving the HSP90 and RIP1 kinase inhibitors.Both regression results of two case studies indicate that the BNN model has the state-of-the-art prediction accuracy(HSP90:R^(2)_(test)=0:947,MAE_(test)=0.184,rtest=0.976,RMSE_(test)=0.220;RIP1 kinase:R^(2)_(test)=0:745,MAE_(test)=0.188,rtest=0.961,RMSE_(test)=0.290)in comparison with other seven ML models.展开更多
Embracing software product lines(SPLs)is pivotal in the dynamic landscape of contemporary software devel-opment.However,the flexibility and global distribution inherent in modern systems pose significant challenges to...Embracing software product lines(SPLs)is pivotal in the dynamic landscape of contemporary software devel-opment.However,the flexibility and global distribution inherent in modern systems pose significant challenges to managing SPL variability,underscoring the critical importance of robust cybersecurity measures.This paper advocates for leveraging machine learning(ML)to address variability management issues and fortify the security of SPL.In the context of the broader special issue theme on innovative cybersecurity approaches,our proposed ML-based framework offers an interdisciplinary perspective,blending insights from computing,social sciences,and business.Specifically,it employs ML for demand analysis,dynamic feature extraction,and enhanced feature selection in distributed settings,contributing to cyber-resilient ecosystems.Our experiments demonstrate the framework’s superiority,emphasizing its potential to boost productivity and security in SPLs.As digital threats evolve,this research catalyzes interdisciplinary collaborations,aligning with the special issue’s goal of breaking down academic barriers to strengthen digital ecosystems against sophisticated attacks while upholding ethics,privacy,and human values.展开更多
Assessing the potential damage caused by earthquakes is crucial for a community’s emergency response.In this study,four machine learning(ML)methods—random forest,extremely randomized trees,AdaBoost(AB),and gradient ...Assessing the potential damage caused by earthquakes is crucial for a community’s emergency response.In this study,four machine learning(ML)methods—random forest,extremely randomized trees,AdaBoost(AB),and gradient boosting(GB)—were employed to develop prediction models for the damage potential of the mainshock(DIMS)and mainshock–aftershock sequences(DIMA).Building structures were modeled using eight single-degree-of-freedom(SDOF)systems with different hysteretic rules.A set of 662 recorded mainshock–aftershock(MS-AS)ground motions was selected from the PEER database.Seven intensity measures(IMs)were chosen to represent the characteristics of the mainshock and aftershock.The results revealed that the selected ML methods can well predict the structural damage potential of the SDOF systems,except for the AB method.The GB model exhibited the best performance,making it the recommended choice for predicting DIMS and DIMA among the four ML models.Additionally,the impact of input variables in the prediction was investigated using the shapley additive explanations(SHAP)method.The high-correlation variables were sensitive to the structural period(T).At T=1.0 s,the mainshock peak ground velocity(PGVM)and aftershock peak ground displacement(PGDA)significantly influenced the prediction of DIMA.When T increased to 5.0 s,the primary high-correlation factor of the mainshock IMs changed from PGVM to the mainshock peak ground displacement(PGDM);however,the highcorrelation variable of the aftershock IMs remained PGDA.The high-correlation factors for DIMS showed trends similar to those of DIMA.Finally,a table summarizing the first and second high-correlation variables for predicting DIMS and DIMA were provided,offering a valuable reference for parameter selection in seismic damage prediction for mainshock–aftershock sequences.展开更多
BACKGROUND Colorectal cancer(CRC)is characterized by high heterogeneity,aggressiveness,and high morbidity and mortality rates.With machine learning(ML)algorithms,patient,tumor,and treatment features can be used to dev...BACKGROUND Colorectal cancer(CRC)is characterized by high heterogeneity,aggressiveness,and high morbidity and mortality rates.With machine learning(ML)algorithms,patient,tumor,and treatment features can be used to develop and validate models for predicting survival.In addition,important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.AIM To construct prognostic prediction models and screen important variables for patients with stageⅠtoⅢCRC.METHODS More than 1000 postoperative CRC patients were grouped according to survival time(with cutoff values of 3 years and 5 years)and assigned to training and testing cohorts(7:3).For each 3-category survival time,predictions were made by 4 ML algorithms(all-variable and important variable-only datasets),each of which was validated via 5-fold cross-validation and bootstrap validation.Important variables were screened with multivariable regression methods.Model performance was evaluated and compared before and after variable screening with the area under the curve(AUC).SHapley Additive exPlanations(SHAP)further demonstrated the impact of important variables on model decision-making.Nomograms were constructed for practical model application.RESULTS Our ML models performed well;the model performance before and after important parameter identification was consistent,and variable screening was effective.The highest pre-and postscreening model AUCs 95%confidence intervals in the testing set were 0.87(0.81-0.92)and 0.89(0.84-0.93)for overall survival,0.75(0.69-0.82)and 0.73(0.64-0.81)for disease-free survival,0.95(0.88-1.00)and 0.88(0.75-0.97)for recurrence-free survival,and 0.76(0.47-0.95)and 0.80(0.53-0.94)for distant metastasis-free survival.Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets.The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors.The nomograms were created.CONCLUSION We constructed a comprehensive,high-accuracy,important variable-based ML architecture for predicting the 3-category survival times.This architecture could serve as a vital reference for managing CRC patients.展开更多
The current study aimed at evaluating the capabilities of seven advanced machine learning techniques(MLTs),including,Support Vector Machine(SVM),Random Forest(RF),Multivariate Adaptive Regression Spline(MARS),Artifici...The current study aimed at evaluating the capabilities of seven advanced machine learning techniques(MLTs),including,Support Vector Machine(SVM),Random Forest(RF),Multivariate Adaptive Regression Spline(MARS),Artificial Neural Network(ANN),Quadratic Discriminant Analysis(QDA),Linear Discriminant Analysis(LDA),and Naive Bayes(NB),for landslide susceptibility modeling and comparison of their performances.Coupling machine learning algorithms with spatial data types for landslide susceptibility mapping is a vitally important issue.This study was carried out using GIS and R open source software at Abha Basin,Asir Region,Saudi Arabia.First,a total of 243 landslide locations were identified at Abha Basin to prepare the landslide inventory map using different data sources.All the landslide areas were randomly separated into two groups with a ratio of 70%for training and 30%for validating purposes.Twelve landslide-variables were generated for landslide susceptibility modeling,which include altitude,lithology,distance to faults,normalized difference vegetation index(NDVI),landuse/landcover(LULC),distance to roads,slope angle,distance to streams,profile curvature,plan curvature,slope length(LS),and slope-aspect.The area under curve(AUC-ROC)approach has been applied to evaluate,validate,and compare the MLTs performance.The results indicated that AUC values for seven MLTs range from 89.0%for QDA to 95.1%for RF.Our findings showed that the RF(AUC=95.1%)and LDA(AUC=941.7%)have produced the best performances in comparison to other MLTs.The outcome of this study and the landslide susceptibility maps would be useful for environmental protection.展开更多
This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linea...This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linear model(SGLM),elastic net(ENET),partial least square(PLS),ridge regression,support vector machine(SVM),classification and regression trees(CART),bagged CART,and random forest(RF)for gully erosion susceptibility mapping(GESM)in Iran.The location of 462 previously existing gully erosion sites were mapped through widespread field investigations,of which 70%(323)and 30%(139)of observations were arbitrarily divided for algorithm calibration and validation.Twelve controlling factors for gully erosion,namely,soil texture,annual mean rainfall,digital elevation model(DEM),drainage density,slope,lithology,topographic wetness index(TWI),distance from rivers,aspect,distance from roads,plan curvature,and profile curvature were ranked in terms of their importance using each MLA.The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE(root mean square error),MAE(mean absolute error),and R-squared.Based on the comparisons among MLA,the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared,and was therefore selected as the best model.The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance.According to the GESM generated using RF,most of the study area is predicted to have a low(53.72%)or moderate(29.65%)susceptibility to gully erosion,whereas only a small area is identified to have a high(12.56%)or very high(4.07%)susceptibility.The outcome generated by RF model is validated using the ROC(Receiver Operating Characteristics)curve approach,which returned an area under the curve(AUC)of 0.985,proving the excellent forecasting ability of the model.The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion.展开更多
The estimation of potato biomass and yield can optimize the planting pattern and tap the production potential.Based on partial least square(PLSR),multiple linear regression(MLR),support vector machine(SVM),random fore...The estimation of potato biomass and yield can optimize the planting pattern and tap the production potential.Based on partial least square(PLSR),multiple linear regression(MLR),support vector machine(SVM),random forest(RF),BP neural network and other machine learning algorithms,the biomass estimation model of potato in different growth stages is constructed by using single variables such as original spectrum,first-order differential spectrum,combined spectrum index and vegetation index(VI)and their coupled combination variables.The accuracy of the models is compared and analyzed,and the best modeling method of biomass in different growth stages is selected.Based on the optimized modeling method,the biomass of each growth stage is estimated,and the yield estimation model of different growth stages is constructed based on the estimation results and the linear regression analysis method,and the accuracy of the model is verified.The results showed that in tuber formation stage,starch accumulation stage and maturity stage,the biomass estimation accuracy based on combination variable was the highest,the best modeling method was MLR and SVM,in tuber growth stage,the best modeling method was MLR,the effect of yield estimation is good.It provides a reference for the algorithm selection of crop biomass and yield models based on machine learning.展开更多
Surface chokes are widely utilized equipment installed on wellheads to control hydrocarbon flow rates.Several correlations have been suggested to model the multiphase flow of oil and gas via surface chokes.However,sub...Surface chokes are widely utilized equipment installed on wellheads to control hydrocarbon flow rates.Several correlations have been suggested to model the multiphase flow of oil and gas via surface chokes.However,substantial errors have been reported in empirical fitting models and correlations to estimate hydrocarbon flow because of the reservoir's heterogeneity,anisotropism,variance in reservoir fluid characteristics at diverse subsurface depths,which introduces complexity in production data.Therefore,the estimation of daily oil and gas production rates is still challenging for the petroleum industry.Recently,hybrid data-driven techniques have been reported to be effective for estimation problems in various aspects of the petroleum domain.This paper investigates hybrid ensemble data-driven approaches to forecast multiphase flow rates through the surface choke(viz.stacked generalization and voting architectures),followed by an assessment of the impact of input production control variables.Otherwise,machine learning models are also trained and tested individually on the production data of hydrocarbon wells located in North Sea.Feature engineering has been properly applied to select the most suitable contributing control variables for daily production rate forecasting.This study provides a chronological explanation of the data analytics required for the interpretation of production data.The test results reveal the estimation performance of the stacked generalization architecture has outperformed other significant paradigms considered for production forecasting.展开更多
Machine learning techniques have attracted more attention as advanced data analytics in building energy analysis.However,most of previous studies are only focused on the prediction capability of machine learning algor...Machine learning techniques have attracted more attention as advanced data analytics in building energy analysis.However,most of previous studies are only focused on the prediction capability of machine learning algorithms to provide reliable energy estimation in buildings.Machine learning also has great potentials to identify energy patterns for urban buildings except for model prediction.Therefore,this paper explores energy characteristic of London domestic properties using ten machine learning algorithms from three aspects:tuning process of learning model;variable importance;spatial analysis of model discrepancy.The results indicate that the combination of these three aspects can provide insights on energy patterns for urban buildings.The tuning process of these models indicates that gas use models should have more terms in comparison with electricity in London and the interaction terms should be considered in both gas and electricity models.The rankings of important variables are very different for gas and electricity prediction in London residential buildings,which suggests that gas and electricity use are affected by different physical and social factors.Moreover,the importance levels for these key variables are markedly different for gas and electricity consumption.There are much more important variables for electricity use in comparison with gas use for the importance levels over 40.The areas with larger model discrepancies can be determined using the local spatial analysis based on these machine learning models.These identified areas have significantly different energy patterns for gas and electricity use.More research is required to understand these unusual patterns of energy use in these areas.展开更多
Open-source and free tools are readily available to the public to process data and assist producers in making management decisions related to agricultural landscapes. On-the-go soil sensors are being used as a proxy t...Open-source and free tools are readily available to the public to process data and assist producers in making management decisions related to agricultural landscapes. On-the-go soil sensors are being used as a proxy to develop digital soil maps because of the data they can collect and their ability to cover a large area quickly. Machine learning, a subcomponent of artificial intelligence, makes predictions from data. Intermixing open-source tools, on-the-go sensor technologies, and machine learning may improve Mississippi soil mapping and crop production. This study aimed to evaluate machine learning for mapping apparent soil electrical conductivity (EC<sub>a</sub>) collected with an on-the-go sensor system at two sites (i.e., MF2, MF9) on a research farm in Mississippi. Machine learning tools (support vector machine) incorporated in Smart-Map, an open-source application, were used to evaluate the sites and derive the apparent electrical conductivity maps. Autocorrelation of the shallow (EC<sub>as</sub>) and deep (EC<sub>ad</sub>) readings was statistically significant at both locations (Moran’s I, p 0.001);however, the spatial correlation was greater at MF2. According to the leave-one-out cross-validation results, the best models were developed for EC<sub>as</sub> versus EC<sub>ad</sub>. Spatial patterns were observed for the EC<sub>as</sub> and EC<sub>ad</sub> readings in both fields. The patterns observed for the EC<sub>ad</sub> readings were more distinct than the EC<sub>as</sub> measurements. The research results indicated that machine learning was valuable for deriving apparent electrical conductivity maps in two Mississippi fields. Location and depth played a role in the machine learner’s ability to develop maps.展开更多
In this article,we comment on the article by Long et al published in the recent issue of the World Journal of Gastrointestinal Oncology.Rectal cancer patients are at risk for developing metachronous liver metastasis(M...In this article,we comment on the article by Long et al published in the recent issue of the World Journal of Gastrointestinal Oncology.Rectal cancer patients are at risk for developing metachronous liver metastasis(MLM),yet early prediction remains challenging due to variations in tumor heterogeneity and the limitations of traditional diagnostic methods.Therefore,there is an urgent need for noninvasive techniques to improve patient outcomes.Long et al’s study introduces an innovative magnetic resonance imaging(MRI)-based radiomics model that integrates high-throughput imaging data with clinical variables to predict MLM.The study employed a 7:3 split to generate training and validation datasets.The MLM prediction model was constructed using the training set and subsequently validated on the validation set using area under the curve(AUC)and dollar-cost averaging metrics to assess performance,robustness,and generalizability.By employing advanced algorithms,the model provides a non-invasive solution to assess tumor heterogeneity for better metastasis prediction,enabling early intervention and personalized treatment planning.However,variations in MRI parameters,such as differences in scanning resolutions and protocols across facilities,patient heterogeneity(e.g.,age,comorbidities),and external factors like carcinoembryonic antigen levels introduce biases.Additionally,confounding factors such as diagnostic staging methods and patient comorbidities require further validation and adjustment to ensure accuracy and generalizability.With evolving Food and Drug Administration regulations on machine learning models in healthcare,compliance and careful consideration of these regulatory requirements are essential to ensuring safe and effective implementation of this approach in clinical practice.In the future,clinicians may be able to utilize datadriven,patient-centric artificial intelligence(AI)-enhanced imaging tools integrated with clinical data,which would help improve early detection of MLM and optimize personalized treatment strategies.Combining radiomics,genomics,histological data,and demographic information can significantly enhance the accuracy and precision of predictive models.展开更多
Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning:...Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning: supervised, unsupervised, and reinforcement learning (RL). However, quantum RL has made the least progress when compared to the other two areas. In this study, we implement the well-known RL algorithm Q learning with a quantum neural network and evaluate it in the grid world environment. RL is learning through interactions with the environment, with the aim of discovering a strategy to maximize the expected cumulative rewards. Problems in RL bring in unique challenges to the study with their sequential nature of learning, potentially long delayed reward signals, and large or infinite size of state and action spaces. This study extends our previous work on solving the contextual bandit problem using a quantum neural network, where the reward signals are immediate after each action.展开更多
The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in...The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in this area. The aim of our study is to explore deep quantum reinforcement learning (RL) on photonic quantum computers, which can process information stored in the quantum states of light. These quantum computers can naturally represent continuous variables, making them an ideal platform to create quantum versions of neural networks. Using quantum photonic circuits, we implement Q learning and actor-critic algorithms with multilayer quantum neural networks and test them in the grid world environment. Our experiments show that 1) these quantum algorithms can solve the RL problem and 2) compared to one layer, using three layer quantum networks improves the learning of both algorithms in terms of rewards collected. In summary, our findings suggest that having more layers in deep quantum RL can enhance the learning outcome.展开更多
基金financial supports of“the Fundamental Research Funds for the Central Universities”(DUT22YG218),NSFC(22278053,22078041)China Postdoctoral Science Foundation(2022M710578)“the Dalian High-level Talents Innovation Support Program”(2021RQ105).
文摘Binding kinetic properties of protein–ligand complexes are crucial factors affecting the drug potency.Nevertheless,the current in silico techniques are insufficient in providing accurate and robust predictions for binding kinetic properties.To this end,this work develops a variety of binding kinetic models for predicting a critical binding kinetic property,dissociation rate constant,using eight machine learning(ML)methods(Bayesian Neural Network(BNN),partial least squares regression,Bayesian ridge,Gaussian process regression,principal component regression,random forest,support vector machine,extreme gradient boosting)and the descriptors of the van der Waals/electrostatic interaction energies.These eight models are applied to two case studies involving the HSP90 and RIP1 kinase inhibitors.Both regression results of two case studies indicate that the BNN model has the state-of-the-art prediction accuracy(HSP90:R^(2)_(test)=0:947,MAE_(test)=0.184,rtest=0.976,RMSE_(test)=0.220;RIP1 kinase:R^(2)_(test)=0:745,MAE_(test)=0.188,rtest=0.961,RMSE_(test)=0.290)in comparison with other seven ML models.
基金supported via funding from Ministry of Defense,Government of Pakistan under Project Number AHQ/95013/6/4/8/NASTP(ACP).Titled:Development of ICT and Artificial Intelligence Based Precision Agriculture Systems Utilizing Dual-Use Aerospace Technologies-GREENAI.
文摘Embracing software product lines(SPLs)is pivotal in the dynamic landscape of contemporary software devel-opment.However,the flexibility and global distribution inherent in modern systems pose significant challenges to managing SPL variability,underscoring the critical importance of robust cybersecurity measures.This paper advocates for leveraging machine learning(ML)to address variability management issues and fortify the security of SPL.In the context of the broader special issue theme on innovative cybersecurity approaches,our proposed ML-based framework offers an interdisciplinary perspective,blending insights from computing,social sciences,and business.Specifically,it employs ML for demand analysis,dynamic feature extraction,and enhanced feature selection in distributed settings,contributing to cyber-resilient ecosystems.Our experiments demonstrate the framework’s superiority,emphasizing its potential to boost productivity and security in SPLs.As digital threats evolve,this research catalyzes interdisciplinary collaborations,aligning with the special issue’s goal of breaking down academic barriers to strengthen digital ecosystems against sophisticated attacks while upholding ethics,privacy,and human values.
基金China Postdoctoral Science Foundation under Grant No.2022M710333the Beijing Postdoctoral Research Foundation under Grant No.2023-zz-141the National Natural Science Foundation of China under Grant Nos.52278492 and 52078176。
文摘Assessing the potential damage caused by earthquakes is crucial for a community’s emergency response.In this study,four machine learning(ML)methods—random forest,extremely randomized trees,AdaBoost(AB),and gradient boosting(GB)—were employed to develop prediction models for the damage potential of the mainshock(DIMS)and mainshock–aftershock sequences(DIMA).Building structures were modeled using eight single-degree-of-freedom(SDOF)systems with different hysteretic rules.A set of 662 recorded mainshock–aftershock(MS-AS)ground motions was selected from the PEER database.Seven intensity measures(IMs)were chosen to represent the characteristics of the mainshock and aftershock.The results revealed that the selected ML methods can well predict the structural damage potential of the SDOF systems,except for the AB method.The GB model exhibited the best performance,making it the recommended choice for predicting DIMS and DIMA among the four ML models.Additionally,the impact of input variables in the prediction was investigated using the shapley additive explanations(SHAP)method.The high-correlation variables were sensitive to the structural period(T).At T=1.0 s,the mainshock peak ground velocity(PGVM)and aftershock peak ground displacement(PGDA)significantly influenced the prediction of DIMA.When T increased to 5.0 s,the primary high-correlation factor of the mainshock IMs changed from PGVM to the mainshock peak ground displacement(PGDM);however,the highcorrelation variable of the aftershock IMs remained PGDA.The high-correlation factors for DIMS showed trends similar to those of DIMA.Finally,a table summarizing the first and second high-correlation variables for predicting DIMS and DIMA were provided,offering a valuable reference for parameter selection in seismic damage prediction for mainshock–aftershock sequences.
基金Supported by National Natural Science Foundation of China,No.81802777.
文摘BACKGROUND Colorectal cancer(CRC)is characterized by high heterogeneity,aggressiveness,and high morbidity and mortality rates.With machine learning(ML)algorithms,patient,tumor,and treatment features can be used to develop and validate models for predicting survival.In addition,important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.AIM To construct prognostic prediction models and screen important variables for patients with stageⅠtoⅢCRC.METHODS More than 1000 postoperative CRC patients were grouped according to survival time(with cutoff values of 3 years and 5 years)and assigned to training and testing cohorts(7:3).For each 3-category survival time,predictions were made by 4 ML algorithms(all-variable and important variable-only datasets),each of which was validated via 5-fold cross-validation and bootstrap validation.Important variables were screened with multivariable regression methods.Model performance was evaluated and compared before and after variable screening with the area under the curve(AUC).SHapley Additive exPlanations(SHAP)further demonstrated the impact of important variables on model decision-making.Nomograms were constructed for practical model application.RESULTS Our ML models performed well;the model performance before and after important parameter identification was consistent,and variable screening was effective.The highest pre-and postscreening model AUCs 95%confidence intervals in the testing set were 0.87(0.81-0.92)and 0.89(0.84-0.93)for overall survival,0.75(0.69-0.82)and 0.73(0.64-0.81)for disease-free survival,0.95(0.88-1.00)and 0.88(0.75-0.97)for recurrence-free survival,and 0.76(0.47-0.95)and 0.80(0.53-0.94)for distant metastasis-free survival.Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets.The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors.The nomograms were created.CONCLUSION We constructed a comprehensive,high-accuracy,important variable-based ML architecture for predicting the 3-category survival times.This architecture could serve as a vital reference for managing CRC patients.
文摘The current study aimed at evaluating the capabilities of seven advanced machine learning techniques(MLTs),including,Support Vector Machine(SVM),Random Forest(RF),Multivariate Adaptive Regression Spline(MARS),Artificial Neural Network(ANN),Quadratic Discriminant Analysis(QDA),Linear Discriminant Analysis(LDA),and Naive Bayes(NB),for landslide susceptibility modeling and comparison of their performances.Coupling machine learning algorithms with spatial data types for landslide susceptibility mapping is a vitally important issue.This study was carried out using GIS and R open source software at Abha Basin,Asir Region,Saudi Arabia.First,a total of 243 landslide locations were identified at Abha Basin to prepare the landslide inventory map using different data sources.All the landslide areas were randomly separated into two groups with a ratio of 70%for training and 30%for validating purposes.Twelve landslide-variables were generated for landslide susceptibility modeling,which include altitude,lithology,distance to faults,normalized difference vegetation index(NDVI),landuse/landcover(LULC),distance to roads,slope angle,distance to streams,profile curvature,plan curvature,slope length(LS),and slope-aspect.The area under curve(AUC-ROC)approach has been applied to evaluate,validate,and compare the MLTs performance.The results indicated that AUC values for seven MLTs range from 89.0%for QDA to 95.1%for RF.Our findings showed that the RF(AUC=95.1%)and LDA(AUC=941.7%)have produced the best performances in comparison to other MLTs.The outcome of this study and the landslide susceptibility maps would be useful for environmental protection.
基金supported by the College of Agriculture,Shiraz University(Grant No.97GRC1M271143)funding from the UK Biotechnology and Biological Sciences Research Council(BBSRC)funded by BBSRC grant award BBS/E/C/000I0330–Soil to Nutrition project 3–Sustainable intensification:optimisation at multiple scales。
文摘This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linear model(SGLM),elastic net(ENET),partial least square(PLS),ridge regression,support vector machine(SVM),classification and regression trees(CART),bagged CART,and random forest(RF)for gully erosion susceptibility mapping(GESM)in Iran.The location of 462 previously existing gully erosion sites were mapped through widespread field investigations,of which 70%(323)and 30%(139)of observations were arbitrarily divided for algorithm calibration and validation.Twelve controlling factors for gully erosion,namely,soil texture,annual mean rainfall,digital elevation model(DEM),drainage density,slope,lithology,topographic wetness index(TWI),distance from rivers,aspect,distance from roads,plan curvature,and profile curvature were ranked in terms of their importance using each MLA.The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE(root mean square error),MAE(mean absolute error),and R-squared.Based on the comparisons among MLA,the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared,and was therefore selected as the best model.The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance.According to the GESM generated using RF,most of the study area is predicted to have a low(53.72%)or moderate(29.65%)susceptibility to gully erosion,whereas only a small area is identified to have a high(12.56%)or very high(4.07%)susceptibility.The outcome generated by RF model is validated using the ROC(Receiver Operating Characteristics)curve approach,which returned an area under the curve(AUC)of 0.985,proving the excellent forecasting ability of the model.The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion.
基金This study was supported by the Natural Science Foundation of China(41871333)the Important Project of Science and Technology of the Henan Province(182102110186)Thanks go to Haikuan Feng for the image data and field sampling collection.
文摘The estimation of potato biomass and yield can optimize the planting pattern and tap the production potential.Based on partial least square(PLSR),multiple linear regression(MLR),support vector machine(SVM),random forest(RF),BP neural network and other machine learning algorithms,the biomass estimation model of potato in different growth stages is constructed by using single variables such as original spectrum,first-order differential spectrum,combined spectrum index and vegetation index(VI)and their coupled combination variables.The accuracy of the models is compared and analyzed,and the best modeling method of biomass in different growth stages is selected.Based on the optimized modeling method,the biomass of each growth stage is estimated,and the yield estimation model of different growth stages is constructed based on the estimation results and the linear regression analysis method,and the accuracy of the model is verified.The results showed that in tuber formation stage,starch accumulation stage and maturity stage,the biomass estimation accuracy based on combination variable was the highest,the best modeling method was MLR and SVM,in tuber growth stage,the best modeling method was MLR,the effect of yield estimation is good.It provides a reference for the algorithm selection of crop biomass and yield models based on machine learning.
文摘Surface chokes are widely utilized equipment installed on wellheads to control hydrocarbon flow rates.Several correlations have been suggested to model the multiphase flow of oil and gas via surface chokes.However,substantial errors have been reported in empirical fitting models and correlations to estimate hydrocarbon flow because of the reservoir's heterogeneity,anisotropism,variance in reservoir fluid characteristics at diverse subsurface depths,which introduces complexity in production data.Therefore,the estimation of daily oil and gas production rates is still challenging for the petroleum industry.Recently,hybrid data-driven techniques have been reported to be effective for estimation problems in various aspects of the petroleum domain.This paper investigates hybrid ensemble data-driven approaches to forecast multiphase flow rates through the surface choke(viz.stacked generalization and voting architectures),followed by an assessment of the impact of input production control variables.Otherwise,machine learning models are also trained and tested individually on the production data of hydrocarbon wells located in North Sea.Feature engineering has been properly applied to select the most suitable contributing control variables for daily production rate forecasting.This study provides a chronological explanation of the data analytics required for the interpretation of production data.The test results reveal the estimation performance of the stacked generalization architecture has outperformed other significant paradigms considered for production forecasting.
基金This research was supported by the National Natural Science Foundation of China(No.51778416)the Key Projects of Philosophy and Social Sciences Research,Ministry of Education(China)“Research on Green Design in Sustainable Development”(contract No.16JZDH014,approval No.16JZD014).
文摘Machine learning techniques have attracted more attention as advanced data analytics in building energy analysis.However,most of previous studies are only focused on the prediction capability of machine learning algorithms to provide reliable energy estimation in buildings.Machine learning also has great potentials to identify energy patterns for urban buildings except for model prediction.Therefore,this paper explores energy characteristic of London domestic properties using ten machine learning algorithms from three aspects:tuning process of learning model;variable importance;spatial analysis of model discrepancy.The results indicate that the combination of these three aspects can provide insights on energy patterns for urban buildings.The tuning process of these models indicates that gas use models should have more terms in comparison with electricity in London and the interaction terms should be considered in both gas and electricity models.The rankings of important variables are very different for gas and electricity prediction in London residential buildings,which suggests that gas and electricity use are affected by different physical and social factors.Moreover,the importance levels for these key variables are markedly different for gas and electricity consumption.There are much more important variables for electricity use in comparison with gas use for the importance levels over 40.The areas with larger model discrepancies can be determined using the local spatial analysis based on these machine learning models.These identified areas have significantly different energy patterns for gas and electricity use.More research is required to understand these unusual patterns of energy use in these areas.
文摘Open-source and free tools are readily available to the public to process data and assist producers in making management decisions related to agricultural landscapes. On-the-go soil sensors are being used as a proxy to develop digital soil maps because of the data they can collect and their ability to cover a large area quickly. Machine learning, a subcomponent of artificial intelligence, makes predictions from data. Intermixing open-source tools, on-the-go sensor technologies, and machine learning may improve Mississippi soil mapping and crop production. This study aimed to evaluate machine learning for mapping apparent soil electrical conductivity (EC<sub>a</sub>) collected with an on-the-go sensor system at two sites (i.e., MF2, MF9) on a research farm in Mississippi. Machine learning tools (support vector machine) incorporated in Smart-Map, an open-source application, were used to evaluate the sites and derive the apparent electrical conductivity maps. Autocorrelation of the shallow (EC<sub>as</sub>) and deep (EC<sub>ad</sub>) readings was statistically significant at both locations (Moran’s I, p 0.001);however, the spatial correlation was greater at MF2. According to the leave-one-out cross-validation results, the best models were developed for EC<sub>as</sub> versus EC<sub>ad</sub>. Spatial patterns were observed for the EC<sub>as</sub> and EC<sub>ad</sub> readings in both fields. The patterns observed for the EC<sub>ad</sub> readings were more distinct than the EC<sub>as</sub> measurements. The research results indicated that machine learning was valuable for deriving apparent electrical conductivity maps in two Mississippi fields. Location and depth played a role in the machine learner’s ability to develop maps.
文摘In this article,we comment on the article by Long et al published in the recent issue of the World Journal of Gastrointestinal Oncology.Rectal cancer patients are at risk for developing metachronous liver metastasis(MLM),yet early prediction remains challenging due to variations in tumor heterogeneity and the limitations of traditional diagnostic methods.Therefore,there is an urgent need for noninvasive techniques to improve patient outcomes.Long et al’s study introduces an innovative magnetic resonance imaging(MRI)-based radiomics model that integrates high-throughput imaging data with clinical variables to predict MLM.The study employed a 7:3 split to generate training and validation datasets.The MLM prediction model was constructed using the training set and subsequently validated on the validation set using area under the curve(AUC)and dollar-cost averaging metrics to assess performance,robustness,and generalizability.By employing advanced algorithms,the model provides a non-invasive solution to assess tumor heterogeneity for better metastasis prediction,enabling early intervention and personalized treatment planning.However,variations in MRI parameters,such as differences in scanning resolutions and protocols across facilities,patient heterogeneity(e.g.,age,comorbidities),and external factors like carcinoembryonic antigen levels introduce biases.Additionally,confounding factors such as diagnostic staging methods and patient comorbidities require further validation and adjustment to ensure accuracy and generalizability.With evolving Food and Drug Administration regulations on machine learning models in healthcare,compliance and careful consideration of these regulatory requirements are essential to ensuring safe and effective implementation of this approach in clinical practice.In the future,clinicians may be able to utilize datadriven,patient-centric artificial intelligence(AI)-enhanced imaging tools integrated with clinical data,which would help improve early detection of MLM and optimize personalized treatment strategies.Combining radiomics,genomics,histological data,and demographic information can significantly enhance the accuracy and precision of predictive models.
文摘Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning: supervised, unsupervised, and reinforcement learning (RL). However, quantum RL has made the least progress when compared to the other two areas. In this study, we implement the well-known RL algorithm Q learning with a quantum neural network and evaluate it in the grid world environment. RL is learning through interactions with the environment, with the aim of discovering a strategy to maximize the expected cumulative rewards. Problems in RL bring in unique challenges to the study with their sequential nature of learning, potentially long delayed reward signals, and large or infinite size of state and action spaces. This study extends our previous work on solving the contextual bandit problem using a quantum neural network, where the reward signals are immediate after each action.
文摘The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in this area. The aim of our study is to explore deep quantum reinforcement learning (RL) on photonic quantum computers, which can process information stored in the quantum states of light. These quantum computers can naturally represent continuous variables, making them an ideal platform to create quantum versions of neural networks. Using quantum photonic circuits, we implement Q learning and actor-critic algorithms with multilayer quantum neural networks and test them in the grid world environment. Our experiments show that 1) these quantum algorithms can solve the RL problem and 2) compared to one layer, using three layer quantum networks improves the learning of both algorithms in terms of rewards collected. In summary, our findings suggest that having more layers in deep quantum RL can enhance the learning outcome.