Online review platforms are becoming increasingly popular,encouraging dishonest merchants and service providers to deceive customers by creating fake reviews for their goods or services.Using Sybil accounts,bot farms,...Online review platforms are becoming increasingly popular,encouraging dishonest merchants and service providers to deceive customers by creating fake reviews for their goods or services.Using Sybil accounts,bot farms,and real account purchases,immoral actors demonize rivals and advertise their goods.Most academic and industry efforts have been aimed at detecting fake/fraudulent product or service evaluations for years.The primary hurdle to identifying fraudulent reviews is the lack of a reliable means to distinguish fraudulent reviews from real ones.This paper adopts a semi-supervised machine learning method to detect fake reviews on any website,among other things.Online reviews are classified using a semi-supervised approach(PU-learning)since there is a shortage of labeled data,and they are dynamic.Then,classification is performed using the machine learning techniques Support Vector Machine(SVM)and Nave Bayes.The performance of the suggested system has been compared with standard works,and experimental findings are assessed using several assessment metrics.展开更多
Traditional laboratory tests for measuring rock uniaxial compressive strength(UCS)are tedious and timeconsuming.There is a pressing need for more effective methods to determine rock UCS,especially in deep mining envir...Traditional laboratory tests for measuring rock uniaxial compressive strength(UCS)are tedious and timeconsuming.There is a pressing need for more effective methods to determine rock UCS,especially in deep mining environments under high in-situ stress.Thus,this study aims to develop an advanced model for predicting the UCS of rockmaterial in deepmining environments by combining three boosting-basedmachine learning methods with four optimization algorithms.For this purpose,the Lead-Zinc mine in Southwest China is considered as the case study.Rock density,P-wave velocity,and point load strength index are used as input variables,and UCS is regarded as the output.Subsequently,twelve hybrid predictive models are obtained.Root mean square error(RMSE),mean absolute error(MAE),coefficient of determination(R2),and the proportion of the mean absolute percentage error less than 20%(A-20)are selected as the evaluation metrics.Experimental results showed that the hybridmodel consisting of the extreme gradient boostingmethod and the artificial bee colony algorithm(XGBoost-ABC)achieved satisfactory results on the training dataset and exhibited the best generalization performance on the testing dataset.The values of R2,A-20,RMSE,and MAE on the training dataset are 0.98,1.0,3.11 MPa,and 2.23MPa,respectively.The highest values of R2 and A-20(0.93 and 0.96),and the smallest RMSE and MAE values of 4.78 MPa and 3.76MPa,are observed on the testing dataset.The proposed hybrid model can be considered a reliable and effective method for predicting rock UCS in deep mines.展开更多
The reasonable quantification of the concrete freezing environment on the Qinghai–Tibet Plateau(QTP) is the primary issue in frost resistant concrete design, which is one of the challenges that the QTP engineering ma...The reasonable quantification of the concrete freezing environment on the Qinghai–Tibet Plateau(QTP) is the primary issue in frost resistant concrete design, which is one of the challenges that the QTP engineering managers should take into account. In this paper, we propose a more realistic method to calculate the number of concrete freeze–thaw cycles(NFTCs) on the QTP. The calculated results show that the NFTCs increase as the altitude of the meteorological station increases with the average NFTCs being 208.7. Four machine learning methods, i.e., the random forest(RF) model, generalized boosting method(GBM), generalized linear model(GLM), and generalized additive model(GAM), are used to fit the NFTCs. The root mean square error(RMSE) values of the RF, GBM, GLM, and GAM are 32.3, 4.3, 247.9, and 161.3, respectively. The R^(2) values of the RF, GBM, GLM, and GAM are 0.93, 0.99, 0.48, and 0.66, respectively. The GBM method performs the best compared to the other three methods, which was shown by the results of RMSE and R^(2) values. The quantitative results from the GBM method indicate that the lowest, medium, and highest NFTC values are distributed in the northern, central, and southern parts of the QTP, respectively. The annual NFTCs in the QTP region are mainly concentrated at 160 and above, and the average NFTCs is 200 across the QTP. Our results can provide scientific guidance and a theoretical basis for the freezing resistance design of concrete in various projects on the QTP.展开更多
Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exer...Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exercises.However, the influence of process parameters and material properties is often non-linear and non-colligative. Inrecent years, machine learning (ML) has emerged as a promising tool to dealwith the complex interrelation betweencomposition, properties, and process parameters to facilitate accelerated discovery and development of new alloysand functionalities. In this study, we adopt an ML-based approach, coupled with genetic algorithm (GA) principles,to design novel copper alloys for achieving seemingly contradictory targets of high strength and high electricalconductivity. Initially, we establish a correlation between the alloy composition (binary to multi-component) andthe target properties, namely, electrical conductivity and mechanical strength. Catboost, an ML model coupledwith GA, was used for this task. The accuracy of the model was above 93.5%. Next, for obtaining the optimizedcompositions the outputs fromthe initial model were refined by combining the concepts of data augmentation andPareto front. Finally, the ultimate objective of predicting the target composition that would deliver the desired rangeof properties was achieved by developing an advancedMLmodel through data segregation and data augmentation.To examine the reliability of this model, results were rigorously compared and verified using several independentdata reported in the literature. This comparison substantiates that the results predicted by our model regarding thevariation of conductivity and evolution ofmicrostructure and mechanical properties with composition are in goodagreement with the reports published in the literature.展开更多
Latent tuberculosis infection(LTBI)has become a major source of active tuberculosis(ATB).Although the tuberculin skin test and interferon-gamma release assay can be used to diagnose LTBI,these methods can only differe...Latent tuberculosis infection(LTBI)has become a major source of active tuberculosis(ATB).Although the tuberculin skin test and interferon-gamma release assay can be used to diagnose LTBI,these methods can only differentiate infected individuals from healthy ones but cannot discriminate between LTBI and ATB.Thus,the diagnosis of LTBI faces many challenges,such as the lack of effective biomarkers from Mycobacterium tuberculosis(MTB)for distinguishing LTBI,the low diagnostic efficacy of biomarkers derived from the human host,and the absence of a gold standard to differentiate between LTBI and ATB.Sputum culture,as the gold standard for diagnosing tuberculosis,is time-consuming and cannot distinguish between ATB and LTBI.In this article,we review the pathogenesis of MTB and the immune mechanisms of the host in LTBI,including the innate and adaptive immune responses,multiple immune evasion mechanisms of MTB,and epigenetic regulation.Based on this knowledge,we summarize the current status and challenges in diagnosing LTBI and present the application of machine learning(ML)in LTBI diagnosis,as well as the advantages and limitations of ML in this context.Finally,we discuss the future development directions of ML applied to LTBI diagnosis.展开更多
In this study,twelve machine learning(ML)techniques are used to accurately estimate the safety factor of rock slopes(SFRS).The dataset used for developing these models consists of 344 rock slopes from various open-pit...In this study,twelve machine learning(ML)techniques are used to accurately estimate the safety factor of rock slopes(SFRS).The dataset used for developing these models consists of 344 rock slopes from various open-pit mines around Iran,evenly distributed between the training(80%)and testing(20%)datasets.The models are evaluated for accuracy using Janbu's limit equilibrium method(LEM)and commercial tool GeoStudio methods.Statistical assessment metrics show that the random forest model is the most accurate in estimating the SFRS(MSE=0.0182,R2=0.8319)and shows high agreement with the results from the LEM method.The results from the long-short-term memory(LSTM)model are the least accurate(MSE=0.037,R2=0.6618)of all the models tested.However,only the null space support vector regression(NuSVR)model performs accurately compared to the practice mode by altering the value of one parameter while maintaining the other parameters constant.It is suggested that this model would be the best one to use to calculate the SFRS.A graphical user interface for the proposed models is developed to further assist in the calculation of the SFRS for engineering difficulties.In this study,we attempt to bridge the gap between modern slope stability evaluation techniques and more conventional analysis methods.展开更多
Pore pressure(PP)information plays an important role in analysing the geomechanical properties of the reservoir and hydrocarbon field development.PP prediction is an essential requirement to ensure safe drilling opera...Pore pressure(PP)information plays an important role in analysing the geomechanical properties of the reservoir and hydrocarbon field development.PP prediction is an essential requirement to ensure safe drilling operations and it is a fundamental input for well design,and mud weight estimation for wellbore stability.However,the pore pressure trend prediction in complex geological provinces is challenging particularly at oceanic slope setting,where sedimentation rate is relatively high and PP can be driven by various complex geo-processes.To overcome these difficulties,an advanced machine learning(ML)tool is implemented in combination with empirical methods.The empirical method for PP prediction is comprised of data pre-processing and model establishment stage.Eaton's method and Porosity method have been used for PP calculation of the well U1517A located at Tuaheni Landslide Complex of Hikurangi Subduction zone of IODP expedition 372.Gamma-ray,sonic travel time,bulk density and sonic derived porosity are extracted from well log data for the theoretical framework construction.The normal compaction trend(NCT)curve analysis is used to check the optimum fitting of the low permeable zone data.The statistical analysis is done using the histogram analysis and Pearson correlation coefficient matrix with PP data series to identify potential input combinations for ML-based predictive model development.The dataset is prepared and divided into two parts:Training and Testing.The PP data and well log of borehole U1517A is pre-processed to scale in between[-1,+1]to fit into the input range of the non-linear activation/transfer function of the decision tree regression model.The Decision Tree Regression(DTR)algorithm is built and compared to the model performance to predict the PP and identify the overpressure zone in Hikurangi Tuaheni Zone of IODP Expedition 372.展开更多
We present our results by using a machine learning(ML)approach for the solution of the Riemann problem for the Euler equations of fluid dynamics.The Riemann problem is an initial-value problem with piecewise-constant ...We present our results by using a machine learning(ML)approach for the solution of the Riemann problem for the Euler equations of fluid dynamics.The Riemann problem is an initial-value problem with piecewise-constant initial data and it represents a mathematical model of the shock tube.The solution of the Riemann problem is the building block for many numerical algorithms in computational fluid dynamics,such as finite-volume or discontinuous Galerkin methods.Therefore,a fast and accurate approximation of the solution of the Riemann problem and construction of the associated numerical fluxes is of crucial importance.The exact solution of the shock tube problem is fully described by the intermediate pressure and mathematically reduces to finding a solution of a nonlinear equation.Prior to delving into the complexities of ML for the Riemann problem,we consider a much simpler formulation,yet very informative,problem of learning roots of quadratic equations based on their coefficients.We compare two approaches:(i)Gaussian process(GP)regressions,and(ii)neural network(NN)approximations.Among these approaches,NNs prove to be more robust and efficient,although GP can be appreciably more accurate(about 30\%).We then use our experience with the quadratic equation to apply the GP and NN approaches to learn the exact solution of the Riemann problem from the initial data or coefficients of the gas equation of state(EOS).We compare GP and NN approximations in both regression and classification analysis and discuss the potential benefits and drawbacks of the ML approach.展开更多
Sleep and well-being have been intricately linked,and sleep hygiene is paramount for developing mental well-being and resilience.Although widespread,sleep disorders require elaborate polysomnography laboratory and pat...Sleep and well-being have been intricately linked,and sleep hygiene is paramount for developing mental well-being and resilience.Although widespread,sleep disorders require elaborate polysomnography laboratory and patient-stay with sleep in unfamiliar environments.Current technologies have allowed various devices to diagnose sleep disorders at home.However,these devices are in various validation stages,with many already receiving approvals from competent authorities.This has captured vast patient-related physiologic data for advanced analytics using artificial intelligence through machine and deep learning applications.This is expected to be integrated with patients’Electronic Health Records and provide individualized prescriptive therapy for sleep disorders in the future.展开更多
BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the ...BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the diagnosis of CRC.AIM To explore the risk factors for SLM in CRC and construct a visual prediction model based on gray-level co-occurrence matrix(GLCM)features collected from magnetic resonance imaging(MRI).METHODS Our study retrospectively enrolled 392 patients with CRC from Yichang Central People’s Hospital from January 2015 to May 2023.Patients were randomly divided into a training and validation group(3:7).The clinical parameters and GLCM features extracted from MRI were included as candidate variables.The prediction model was constructed using a generalized linear regression model,random forest model(RFM),and artificial neural network model.Receiver operating characteristic curves and decision curves were used to evaluate the prediction model.RESULTS Among the 392 patients,48 had SLM(12.24%).We obtained fourteen GLCM imaging data for variable screening of SLM prediction models.Inverse difference,mean sum,sum entropy,sum variance,sum of squares,energy,and difference variance were listed as candidate variables,and the prediction efficiency(area under the curve)of the subsequent RFM in the training set and internal validation set was 0.917[95%confidence interval(95%CI):0.866-0.968]and 0.09(95%CI:0.858-0.960),respectively.CONCLUSION A predictive model combining GLCM image features with machine learning can predict SLM in CRC.This model can assist clinicians in making timely and personalized clinical decisions.展开更多
The huge increase in the communication network rate has made the application fields and scenarios for vehicular ad hoc networks more abundant and diversified and proposed more requirements for the efficiency and quali...The huge increase in the communication network rate has made the application fields and scenarios for vehicular ad hoc networks more abundant and diversified and proposed more requirements for the efficiency and quality of data transmission.To improve the limited communication distance and poor communication quality of the Internet of Vehicles(IoV),an optimal intelligent routing algorithm is proposed in this paper.Combined multiweight decision algorithm with the greedy perimeter stateless routing protocol,designed and evaluated standardized function for link stability.Linear additive weighting is used to optimize link stability and distance to improve the packet delivery rate of the IoV.The blockchain system is used as the storage structure for relay data,and the smart contract incentive algorithm based on machine learning is used to encourage relay vehicles to provide more communication bandwidth for data packet transmission.The proposed scheme is simulated and analyzed under different scenarios and different parameters.The experimental results demonstrate that the proposed scheme can effectively reduce the packet loss rate and improve system performance.展开更多
Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of D...Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.展开更多
Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization...Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.展开更多
The rapid advancement of wireless communication is forming a hyper-connected 5G network in which billions of linked devices generate massive amounts of data.The traffic control and data forwarding functions are decoup...The rapid advancement of wireless communication is forming a hyper-connected 5G network in which billions of linked devices generate massive amounts of data.The traffic control and data forwarding functions are decoupled in software-defined networking(SDN)and allow the network to be programmable.Each switch in SDN keeps track of forwarding information in a flow table.The SDN switches must search the flow table for the flow rules that match the packets to handle the incoming packets.Due to the obvious vast quantity of data in data centres,the capacity of the flow table restricts the data plane’s forwarding capabilities.So,the SDN must handle traffic from across the whole network.The flow table depends on Ternary Content Addressable Memorable Memory(TCAM)for storing and a quick search of regulations;it is restricted in capacity owing to its elevated cost and energy consumption.Whenever the flow table is abused and overflowing,the usual regulations cannot be executed quickly.In this case,we consider lowrate flow table overflowing that causes collision flow rules to be installed and consumes excessive existing flow table capacity by delivering packets that don’t fit the flow table at a low rate.This study introduces machine learning techniques for detecting and categorizing low-rate collision flows table in SDN,using Feed ForwardNeuralNetwork(FFNN),K-Means,and Decision Tree(DT).We generate two network topologies,Fat Tree and Simple Tree Topologies,with the Mininet simulator and coupled to the OpenDayLight(ODL)controller.The efficiency and efficacy of the suggested algorithms are assessed using several assessment indicators such as success rate query,propagation delay,overall dropped packets,energy consumption,bandwidth usage,latency rate,and throughput.The findings showed that the suggested technique to tackle the flow table congestion problem minimizes the number of flows while retaining the statistical consistency of the 5G network.By putting the proposed flow method and checking whether a packet may move from point A to point B without breaking certain regulations,the evaluation tool examines every flow against a set of criteria.The FFNN with DT and K-means algorithms obtain accuracies of 96.29%and 97.51%,respectively,in the identification of collision flows,according to the experimental outcome when associated with existing methods from the literature.展开更多
The mango, a fruit of immense economic and dietary significance in numerous tropical and subtropical regions, plays a pivotal role in our agricultural landscape. Accurate identification is not just a necessity, but a ...The mango, a fruit of immense economic and dietary significance in numerous tropical and subtropical regions, plays a pivotal role in our agricultural landscape. Accurate identification is not just a necessity, but a crucial step for effective classification, sorting, and marketing. This study delves into the potential of machine learning for this task, comparing the performance of four models: MobileNetV2, Xception, VGG16, and ResNet50V2. These models were trained on a dataset of annotated mango images, and their performance was evaluated using precision, accuracy, F1 score, and recall, which are standard metrics for image classification. The Xception model, with its exceptional performance, outshone the other models on all performance indicators. It achieved a staggering accuracy of 99.47%, an F1 score of 99.43%, and a recall of 99.43%, showcasing its remarkable ability to accurately identify mango varieties. MobileNetV2 followed closely with performances of 98.95% accuracy, 98.85% F1 score, and 98.86% recall. ResNet50V2 also delivered satisfactory results with 97.39% accuracy, 97.08% F1 score, and 97.17% recall. VGG16, however, was the least effective, with a precision rate of 83.25%, an F1 score of 83.25%, and a recall of 85.47%. These results confirm the superiority of the Xception model in detecting mango varieties. Its advanced architecture allows it to capture more distinguishing features of mango images, leading to greater precision and reliability. Xception’s robustness in identifying true positives is another advantage, minimizing false positives and contributing to more accurate classification. This study highlights the promising potential of machine learning, particularly the Xception model, for accurately identifying mango varieties.展开更多
Stocks in the Chinese stock market can be divided into ST stocks and normal stocks, so to prevent investors from buying potential ST stocks, this paper first performs SMOTEENN oversampling data preprocessing for the S...Stocks in the Chinese stock market can be divided into ST stocks and normal stocks, so to prevent investors from buying potential ST stocks, this paper first performs SMOTEENN oversampling data preprocessing for the ST stock category, and selects 139 financial indicators and technical factor as predictive features. Then, it combines the Boruta algorithm and Copula entropy method for feature selection, effectively improving the machine learning model’s performance in ST stock classification, with the AUC values of the two models reaching 98% on the test set. In the model selection and optimization, this paper uses six major models, including logistic regression, XGBoost, AdaBoost, LightGBM, Catboost, and MLP, for modeling and optimizes them using the Optuna framework. Ultimately, XGBoost model is selected as the best model because its AUC value exceeds 95% and its running time is less. Finally, the XGBoost model is explained using the SHAP theory and the interaction between features is discovered, further improving the model’s accuracy and AUC value by about 0.6%, verifying the effectiveness of the model.展开更多
COVID-19 is a contagious disease and its several variants put under stress in all walks of life and economy as well.Early diagnosis of the virus is a crucial task to prevent the spread of the virus as it is a threat t...COVID-19 is a contagious disease and its several variants put under stress in all walks of life and economy as well.Early diagnosis of the virus is a crucial task to prevent the spread of the virus as it is a threat to life in the whole world.However,with the advancement of technology,the Internet of Things(IoT)and social IoT(SIoT),the versatile data produced by smart devices helped a lot in overcoming this lethal disease.Data mining is a technique that could be used for extracting useful information from massive data.In this study,we used five supervised ML strategies for creating a model to analyze and forecast the existence of COVID-19 using the Kaggle dataset“COVID-19 Symptoms and Presence.”RapidMiner Studio ML software was used to apply the Decision Tree(DT),Random Forest(RF),K-Nearest Neighbors(K-NNs)and Naive Bayes(NB),Integrated Decision Tree(ID3)algorithms.To develop the model,the performance of each model was tested using 10-fold cross-validation and compared to major accuracy measures,Cohan’s kappa statistics,properly or mistakenly categorized cases and root means square error.The results demonstrate that DT outperforms other methods,with an accuracy of 98.42%and a root mean square error of 0.11.In the future,a devisedmodel will be highly recommendable and supportive for early prediction/diagnosis of disease by providing different data sets.展开更多
The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compound...The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.展开更多
CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
文摘Online review platforms are becoming increasingly popular,encouraging dishonest merchants and service providers to deceive customers by creating fake reviews for their goods or services.Using Sybil accounts,bot farms,and real account purchases,immoral actors demonize rivals and advertise their goods.Most academic and industry efforts have been aimed at detecting fake/fraudulent product or service evaluations for years.The primary hurdle to identifying fraudulent reviews is the lack of a reliable means to distinguish fraudulent reviews from real ones.This paper adopts a semi-supervised machine learning method to detect fake reviews on any website,among other things.Online reviews are classified using a semi-supervised approach(PU-learning)since there is a shortage of labeled data,and they are dynamic.Then,classification is performed using the machine learning techniques Support Vector Machine(SVM)and Nave Bayes.The performance of the suggested system has been compared with standard works,and experimental findings are assessed using several assessment metrics.
基金supported by the National Natural Science Foundation of China(Grant No.52374153).
文摘Traditional laboratory tests for measuring rock uniaxial compressive strength(UCS)are tedious and timeconsuming.There is a pressing need for more effective methods to determine rock UCS,especially in deep mining environments under high in-situ stress.Thus,this study aims to develop an advanced model for predicting the UCS of rockmaterial in deepmining environments by combining three boosting-basedmachine learning methods with four optimization algorithms.For this purpose,the Lead-Zinc mine in Southwest China is considered as the case study.Rock density,P-wave velocity,and point load strength index are used as input variables,and UCS is regarded as the output.Subsequently,twelve hybrid predictive models are obtained.Root mean square error(RMSE),mean absolute error(MAE),coefficient of determination(R2),and the proportion of the mean absolute percentage error less than 20%(A-20)are selected as the evaluation metrics.Experimental results showed that the hybridmodel consisting of the extreme gradient boostingmethod and the artificial bee colony algorithm(XGBoost-ABC)achieved satisfactory results on the training dataset and exhibited the best generalization performance on the testing dataset.The values of R2,A-20,RMSE,and MAE on the training dataset are 0.98,1.0,3.11 MPa,and 2.23MPa,respectively.The highest values of R2 and A-20(0.93 and 0.96),and the smallest RMSE and MAE values of 4.78 MPa and 3.76MPa,are observed on the testing dataset.The proposed hybrid model can be considered a reliable and effective method for predicting rock UCS in deep mines.
基金supported by Shandong Provincial Natural Science Foundation (grant number: ZR2023MD036)Key Research and Development Project in Shandong Province (grant number: 2019GGX101064)project for excellent youth foundation of the innovation teacher team, Shandong (grant number: 2022KJ310)。
文摘The reasonable quantification of the concrete freezing environment on the Qinghai–Tibet Plateau(QTP) is the primary issue in frost resistant concrete design, which is one of the challenges that the QTP engineering managers should take into account. In this paper, we propose a more realistic method to calculate the number of concrete freeze–thaw cycles(NFTCs) on the QTP. The calculated results show that the NFTCs increase as the altitude of the meteorological station increases with the average NFTCs being 208.7. Four machine learning methods, i.e., the random forest(RF) model, generalized boosting method(GBM), generalized linear model(GLM), and generalized additive model(GAM), are used to fit the NFTCs. The root mean square error(RMSE) values of the RF, GBM, GLM, and GAM are 32.3, 4.3, 247.9, and 161.3, respectively. The R^(2) values of the RF, GBM, GLM, and GAM are 0.93, 0.99, 0.48, and 0.66, respectively. The GBM method performs the best compared to the other three methods, which was shown by the results of RMSE and R^(2) values. The quantitative results from the GBM method indicate that the lowest, medium, and highest NFTC values are distributed in the northern, central, and southern parts of the QTP, respectively. The annual NFTCs in the QTP region are mainly concentrated at 160 and above, and the average NFTCs is 200 across the QTP. Our results can provide scientific guidance and a theoretical basis for the freezing resistance design of concrete in various projects on the QTP.
文摘Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exercises.However, the influence of process parameters and material properties is often non-linear and non-colligative. Inrecent years, machine learning (ML) has emerged as a promising tool to dealwith the complex interrelation betweencomposition, properties, and process parameters to facilitate accelerated discovery and development of new alloysand functionalities. In this study, we adopt an ML-based approach, coupled with genetic algorithm (GA) principles,to design novel copper alloys for achieving seemingly contradictory targets of high strength and high electricalconductivity. Initially, we establish a correlation between the alloy composition (binary to multi-component) andthe target properties, namely, electrical conductivity and mechanical strength. Catboost, an ML model coupledwith GA, was used for this task. The accuracy of the model was above 93.5%. Next, for obtaining the optimizedcompositions the outputs fromthe initial model were refined by combining the concepts of data augmentation andPareto front. Finally, the ultimate objective of predicting the target composition that would deliver the desired rangeof properties was achieved by developing an advancedMLmodel through data segregation and data augmentation.To examine the reliability of this model, results were rigorously compared and verified using several independentdata reported in the literature. This comparison substantiates that the results predicted by our model regarding thevariation of conductivity and evolution ofmicrostructure and mechanical properties with composition are in goodagreement with the reports published in the literature.
文摘Latent tuberculosis infection(LTBI)has become a major source of active tuberculosis(ATB).Although the tuberculin skin test and interferon-gamma release assay can be used to diagnose LTBI,these methods can only differentiate infected individuals from healthy ones but cannot discriminate between LTBI and ATB.Thus,the diagnosis of LTBI faces many challenges,such as the lack of effective biomarkers from Mycobacterium tuberculosis(MTB)for distinguishing LTBI,the low diagnostic efficacy of biomarkers derived from the human host,and the absence of a gold standard to differentiate between LTBI and ATB.Sputum culture,as the gold standard for diagnosing tuberculosis,is time-consuming and cannot distinguish between ATB and LTBI.In this article,we review the pathogenesis of MTB and the immune mechanisms of the host in LTBI,including the innate and adaptive immune responses,multiple immune evasion mechanisms of MTB,and epigenetic regulation.Based on this knowledge,we summarize the current status and challenges in diagnosing LTBI and present the application of machine learning(ML)in LTBI diagnosis,as well as the advantages and limitations of ML in this context.Finally,we discuss the future development directions of ML applied to LTBI diagnosis.
基金supported via funding from Prince Satam bin Abdulaziz University project number (PSAU/2024/R/1445)The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through large Group Research Project (Grant No.RGP.2/357/44).
文摘In this study,twelve machine learning(ML)techniques are used to accurately estimate the safety factor of rock slopes(SFRS).The dataset used for developing these models consists of 344 rock slopes from various open-pit mines around Iran,evenly distributed between the training(80%)and testing(20%)datasets.The models are evaluated for accuracy using Janbu's limit equilibrium method(LEM)and commercial tool GeoStudio methods.Statistical assessment metrics show that the random forest model is the most accurate in estimating the SFRS(MSE=0.0182,R2=0.8319)and shows high agreement with the results from the LEM method.The results from the long-short-term memory(LSTM)model are the least accurate(MSE=0.037,R2=0.6618)of all the models tested.However,only the null space support vector regression(NuSVR)model performs accurately compared to the practice mode by altering the value of one parameter while maintaining the other parameters constant.It is suggested that this model would be the best one to use to calculate the SFRS.A graphical user interface for the proposed models is developed to further assist in the calculation of the SFRS for engineering difficulties.In this study,we attempt to bridge the gap between modern slope stability evaluation techniques and more conventional analysis methods.
文摘Pore pressure(PP)information plays an important role in analysing the geomechanical properties of the reservoir and hydrocarbon field development.PP prediction is an essential requirement to ensure safe drilling operations and it is a fundamental input for well design,and mud weight estimation for wellbore stability.However,the pore pressure trend prediction in complex geological provinces is challenging particularly at oceanic slope setting,where sedimentation rate is relatively high and PP can be driven by various complex geo-processes.To overcome these difficulties,an advanced machine learning(ML)tool is implemented in combination with empirical methods.The empirical method for PP prediction is comprised of data pre-processing and model establishment stage.Eaton's method and Porosity method have been used for PP calculation of the well U1517A located at Tuaheni Landslide Complex of Hikurangi Subduction zone of IODP expedition 372.Gamma-ray,sonic travel time,bulk density and sonic derived porosity are extracted from well log data for the theoretical framework construction.The normal compaction trend(NCT)curve analysis is used to check the optimum fitting of the low permeable zone data.The statistical analysis is done using the histogram analysis and Pearson correlation coefficient matrix with PP data series to identify potential input combinations for ML-based predictive model development.The dataset is prepared and divided into two parts:Training and Testing.The PP data and well log of borehole U1517A is pre-processed to scale in between[-1,+1]to fit into the input range of the non-linear activation/transfer function of the decision tree regression model.The Decision Tree Regression(DTR)algorithm is built and compared to the model performance to predict the PP and identify the overpressure zone in Hikurangi Tuaheni Zone of IODP Expedition 372.
基金This work was performed under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National Laboratory under Contract No.DE-AC52-06NA25396The authors gratefully acknowledge the support of the US Department of Energy National Nuclear Security Administration Advanced Simulation and Computing Program.The Los Alamos unlimited release number is LA-UR-19-32257.
文摘We present our results by using a machine learning(ML)approach for the solution of the Riemann problem for the Euler equations of fluid dynamics.The Riemann problem is an initial-value problem with piecewise-constant initial data and it represents a mathematical model of the shock tube.The solution of the Riemann problem is the building block for many numerical algorithms in computational fluid dynamics,such as finite-volume or discontinuous Galerkin methods.Therefore,a fast and accurate approximation of the solution of the Riemann problem and construction of the associated numerical fluxes is of crucial importance.The exact solution of the shock tube problem is fully described by the intermediate pressure and mathematically reduces to finding a solution of a nonlinear equation.Prior to delving into the complexities of ML for the Riemann problem,we consider a much simpler formulation,yet very informative,problem of learning roots of quadratic equations based on their coefficients.We compare two approaches:(i)Gaussian process(GP)regressions,and(ii)neural network(NN)approximations.Among these approaches,NNs prove to be more robust and efficient,although GP can be appreciably more accurate(about 30\%).We then use our experience with the quadratic equation to apply the GP and NN approaches to learn the exact solution of the Riemann problem from the initial data or coefficients of the gas equation of state(EOS).We compare GP and NN approximations in both regression and classification analysis and discuss the potential benefits and drawbacks of the ML approach.
文摘Sleep and well-being have been intricately linked,and sleep hygiene is paramount for developing mental well-being and resilience.Although widespread,sleep disorders require elaborate polysomnography laboratory and patient-stay with sleep in unfamiliar environments.Current technologies have allowed various devices to diagnose sleep disorders at home.However,these devices are in various validation stages,with many already receiving approvals from competent authorities.This has captured vast patient-related physiologic data for advanced analytics using artificial intelligence through machine and deep learning applications.This is expected to be integrated with patients’Electronic Health Records and provide individualized prescriptive therapy for sleep disorders in the future.
文摘BACKGROUND Synchronous liver metastasis(SLM)is a significant contributor to morbidity in colorectal cancer(CRC).There are no effective predictive device integration algorithms to predict adverse SLM events during the diagnosis of CRC.AIM To explore the risk factors for SLM in CRC and construct a visual prediction model based on gray-level co-occurrence matrix(GLCM)features collected from magnetic resonance imaging(MRI).METHODS Our study retrospectively enrolled 392 patients with CRC from Yichang Central People’s Hospital from January 2015 to May 2023.Patients were randomly divided into a training and validation group(3:7).The clinical parameters and GLCM features extracted from MRI were included as candidate variables.The prediction model was constructed using a generalized linear regression model,random forest model(RFM),and artificial neural network model.Receiver operating characteristic curves and decision curves were used to evaluate the prediction model.RESULTS Among the 392 patients,48 had SLM(12.24%).We obtained fourteen GLCM imaging data for variable screening of SLM prediction models.Inverse difference,mean sum,sum entropy,sum variance,sum of squares,energy,and difference variance were listed as candidate variables,and the prediction efficiency(area under the curve)of the subsequent RFM in the training set and internal validation set was 0.917[95%confidence interval(95%CI):0.866-0.968]and 0.09(95%CI:0.858-0.960),respectively.CONCLUSION A predictive model combining GLCM image features with machine learning can predict SLM in CRC.This model can assist clinicians in making timely and personalized clinical decisions.
基金supported by the National Key R&D Program of China (2020YFB2008400)LAGEO of Chinese Academy of Sciences (LAGEO-2019-2)+11 种基金Program for Science&Technology Innovation Talents in the University of Henan Province (20HASTIT022)21th Project of the Xizang Cultural Inheritance and Development Collaborative Innovation Center in 2018 (21IRTSTHN015)Natural Science Foundation of Xizang Named“Research of Key Technology of Millimeter Wave MIMO Secure Transmission with Relay Enhancement”in 2018Xizang Autonomous Region Education Science“13th Five-year Plan”Major Project for 2018 (XZJKY201803)Natural Science Foundation of Henan under Grant 202300410126Young Backbone Teachers in Henan Province (2018GGJS049)Henan Province Young Talent Lift Project (2020HYTP009)Program for Innovative Research Team in University of Henan Province (21IRTSTHNO15)Equipment Pre-research Joint Research Program of Ministry of Education (8091B032129)Training Program for Young Scholar of Henan Province for Colleges and Universities under Grand (2020GGJS172)Program for Science&Technology Innovation Talents in Universities of Henan Province under Grand (22HASTIT020)Henan Province Science Fund for Distinguished Young Scholars (222300420006).
文摘The huge increase in the communication network rate has made the application fields and scenarios for vehicular ad hoc networks more abundant and diversified and proposed more requirements for the efficiency and quality of data transmission.To improve the limited communication distance and poor communication quality of the Internet of Vehicles(IoV),an optimal intelligent routing algorithm is proposed in this paper.Combined multiweight decision algorithm with the greedy perimeter stateless routing protocol,designed and evaluated standardized function for link stability.Linear additive weighting is used to optimize link stability and distance to improve the packet delivery rate of the IoV.The blockchain system is used as the storage structure for relay data,and the smart contract incentive algorithm based on machine learning is used to encourage relay vehicles to provide more communication bandwidth for data packet transmission.The proposed scheme is simulated and analyzed under different scenarios and different parameters.The experimental results demonstrate that the proposed scheme can effectively reduce the packet loss rate and improve system performance.
基金the National Natural Science Foundation of China(Grant Number:81970631 to W.L.).
文摘Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.
基金funded by the National Key Research and Development Program of China(2017YFA0605002,2017YFA0605004,and 2016YFA0601501)the National Natural Science Foundation of China(41961124007,51779145,and 41830863)“Six top talents”in Jiangsu Province(RJFW-031)。
文摘Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.
基金Taif University Researchers supporting Project number(TURSP-2020/215),Taif University,Taif,Saudi Arabia.
文摘The rapid advancement of wireless communication is forming a hyper-connected 5G network in which billions of linked devices generate massive amounts of data.The traffic control and data forwarding functions are decoupled in software-defined networking(SDN)and allow the network to be programmable.Each switch in SDN keeps track of forwarding information in a flow table.The SDN switches must search the flow table for the flow rules that match the packets to handle the incoming packets.Due to the obvious vast quantity of data in data centres,the capacity of the flow table restricts the data plane’s forwarding capabilities.So,the SDN must handle traffic from across the whole network.The flow table depends on Ternary Content Addressable Memorable Memory(TCAM)for storing and a quick search of regulations;it is restricted in capacity owing to its elevated cost and energy consumption.Whenever the flow table is abused and overflowing,the usual regulations cannot be executed quickly.In this case,we consider lowrate flow table overflowing that causes collision flow rules to be installed and consumes excessive existing flow table capacity by delivering packets that don’t fit the flow table at a low rate.This study introduces machine learning techniques for detecting and categorizing low-rate collision flows table in SDN,using Feed ForwardNeuralNetwork(FFNN),K-Means,and Decision Tree(DT).We generate two network topologies,Fat Tree and Simple Tree Topologies,with the Mininet simulator and coupled to the OpenDayLight(ODL)controller.The efficiency and efficacy of the suggested algorithms are assessed using several assessment indicators such as success rate query,propagation delay,overall dropped packets,energy consumption,bandwidth usage,latency rate,and throughput.The findings showed that the suggested technique to tackle the flow table congestion problem minimizes the number of flows while retaining the statistical consistency of the 5G network.By putting the proposed flow method and checking whether a packet may move from point A to point B without breaking certain regulations,the evaluation tool examines every flow against a set of criteria.The FFNN with DT and K-means algorithms obtain accuracies of 96.29%and 97.51%,respectively,in the identification of collision flows,according to the experimental outcome when associated with existing methods from the literature.
文摘The mango, a fruit of immense economic and dietary significance in numerous tropical and subtropical regions, plays a pivotal role in our agricultural landscape. Accurate identification is not just a necessity, but a crucial step for effective classification, sorting, and marketing. This study delves into the potential of machine learning for this task, comparing the performance of four models: MobileNetV2, Xception, VGG16, and ResNet50V2. These models were trained on a dataset of annotated mango images, and their performance was evaluated using precision, accuracy, F1 score, and recall, which are standard metrics for image classification. The Xception model, with its exceptional performance, outshone the other models on all performance indicators. It achieved a staggering accuracy of 99.47%, an F1 score of 99.43%, and a recall of 99.43%, showcasing its remarkable ability to accurately identify mango varieties. MobileNetV2 followed closely with performances of 98.95% accuracy, 98.85% F1 score, and 98.86% recall. ResNet50V2 also delivered satisfactory results with 97.39% accuracy, 97.08% F1 score, and 97.17% recall. VGG16, however, was the least effective, with a precision rate of 83.25%, an F1 score of 83.25%, and a recall of 85.47%. These results confirm the superiority of the Xception model in detecting mango varieties. Its advanced architecture allows it to capture more distinguishing features of mango images, leading to greater precision and reliability. Xception’s robustness in identifying true positives is another advantage, minimizing false positives and contributing to more accurate classification. This study highlights the promising potential of machine learning, particularly the Xception model, for accurately identifying mango varieties.
文摘Stocks in the Chinese stock market can be divided into ST stocks and normal stocks, so to prevent investors from buying potential ST stocks, this paper first performs SMOTEENN oversampling data preprocessing for the ST stock category, and selects 139 financial indicators and technical factor as predictive features. Then, it combines the Boruta algorithm and Copula entropy method for feature selection, effectively improving the machine learning model’s performance in ST stock classification, with the AUC values of the two models reaching 98% on the test set. In the model selection and optimization, this paper uses six major models, including logistic regression, XGBoost, AdaBoost, LightGBM, Catboost, and MLP, for modeling and optimizes them using the Optuna framework. Ultimately, XGBoost model is selected as the best model because its AUC value exceeds 95% and its running time is less. Finally, the XGBoost model is explained using the SHAP theory and the interaction between features is discovered, further improving the model’s accuracy and AUC value by about 0.6%, verifying the effectiveness of the model.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A5A1021944 and 2021R1A5A1021944)supported by Kyungpook National University Research Fund,2020.
文摘COVID-19 is a contagious disease and its several variants put under stress in all walks of life and economy as well.Early diagnosis of the virus is a crucial task to prevent the spread of the virus as it is a threat to life in the whole world.However,with the advancement of technology,the Internet of Things(IoT)and social IoT(SIoT),the versatile data produced by smart devices helped a lot in overcoming this lethal disease.Data mining is a technique that could be used for extracting useful information from massive data.In this study,we used five supervised ML strategies for creating a model to analyze and forecast the existence of COVID-19 using the Kaggle dataset“COVID-19 Symptoms and Presence.”RapidMiner Studio ML software was used to apply the Decision Tree(DT),Random Forest(RF),K-Nearest Neighbors(K-NNs)and Naive Bayes(NB),Integrated Decision Tree(ID3)algorithms.To develop the model,the performance of each model was tested using 10-fold cross-validation and compared to major accuracy measures,Cohan’s kappa statistics,properly or mistakenly categorized cases and root means square error.The results demonstrate that DT outperforms other methods,with an accuracy of 98.42%and a root mean square error of 0.11.In the future,a devisedmodel will be highly recommendable and supportive for early prediction/diagnosis of disease by providing different data sets.
文摘The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.