Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the ne...Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the need for effective risk prediction models. Machine learning (ML) techniques have shown promise in analyzing complex data patterns and predicting disease outcomes. The accuracy of these techniques is greatly affected by changing their parameters. Hyperparameter optimization plays a crucial role in improving model performance. In this work, the Particle Swarm Optimization (PSO) algorithm was used to effectively search the hyperparameter space and improve the predictive power of the machine learning models by identifying the optimal hyperparameters that can provide the highest accuracy. A dataset with a variety of clinical and epidemiological characteristics linked to COVID-19 cases was used in this study. Various machine learning models, including Random Forests, Decision Trees, Support Vector Machines, and Neural Networks, were utilized to capture the complex relationships present in the data. To evaluate the predictive performance of the models, the accuracy metric was employed. The experimental findings showed that the suggested method of estimating COVID-19 risk is effective. When compared to baseline models, the optimized machine learning models performed better and produced better results.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary dom...Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.展开更多
The biomedical data classification process has received significant attention in recent times due to a massive increase in the generation of healthcare data from various sources.The developments of artificial intellig...The biomedical data classification process has received significant attention in recent times due to a massive increase in the generation of healthcare data from various sources.The developments of artificial intelligence(AI)and machine learning(ML)models assist in the effectual design of medical data classification models.Therefore,this article concentrates on the development of optimal Stacked Long Short Term Memory Sequence-toSequence Autoencoder(OSAE-LSTM)model for biomedical data classification.The presented OSAE-LSTM model intends to classify the biomedical data for the existence of diseases.Primarily,the OSAE-LSTM model involves min-max normalization based pre-processing to scale the data into uniform format.Followed by,the SAE-LSTM model is utilized for the detection and classification of diseases in biomedical data.At last,manta ray foraging optimization(MRFO)algorithm has been employed for hyperparameter optimization process.The utilization of MRFO algorithm assists in optimal selection of hypermeters involved in the SAE-LSTM model.The simulation analysis of the OSAE-LSTM model has been tested using a set of benchmark medical datasets and the results reported the improvements of the OSAELSTM model over the other approaches under several dimensions.展开更多
In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source a...In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source and heterogeneous data distributed storage, in order to achieve the sharing, we must solve from the storage management to the interoperability of a series of mechanism, the method and implementation technology. Unstructured data does not have strict structure, therefore, compared with structured information that is more difficult to standardization, with management more difficult. According to these characteristics, the large capacity of unstructured data or using files separately store, is stored in the database index of similar pointer. Under this background, we propose the new idea on the structured data mining algorithm that is meaningful.展开更多
Arabic is one of the most spoken languages across the globe.However,there are fewer studies concerning Sentiment Analysis(SA)in Arabic.In recent years,the detected sentiments and emotions expressed in tweets have rece...Arabic is one of the most spoken languages across the globe.However,there are fewer studies concerning Sentiment Analysis(SA)in Arabic.In recent years,the detected sentiments and emotions expressed in tweets have received significant interest.The substantial role played by the Arab region in international politics and the global economy has urged the need to examine the sentiments and emotions in the Arabic language.Two common models are available:Machine Learning and lexicon-based approaches to address emotion classification problems.With this motivation,the current research article develops a Teaching and Learning Optimization with Machine Learning Based Emotion Recognition and Classification(TLBOML-ERC)model for Sentiment Analysis on tweets made in the Arabic language.The presented TLBOML-ERC model focuses on recognising emotions and sentiments expressed in Arabic tweets.To attain this,the proposed TLBOMLERC model initially carries out data pre-processing and a Continuous Bag Of Words(CBOW)-based word embedding process.In addition,Denoising Autoencoder(DAE)model is also exploited to categorise different emotions expressed in Arabic tweets.To improve the efficacy of the DAE model,the Teaching and Learning-based Optimization(TLBO)algorithm is utilized to optimize the parameters.The proposed TLBOML-ERC method was experimentally validated with the help of an Arabic tweets dataset.The obtained results show the promising performance of the proposed TLBOML-ERC model on Arabic emotion classification.展开更多
Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is s...Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is still challenging for most researchers.Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach.This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field,wireline logs were used to train Extra-Trees,Ridge,Bagging,and XGBoost models.The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models.Based on statistical metrics,the Extra-Trees model achieved the highest test accuracy of 0.996,RMSE of 19.54 mD,and MAE of 3.18 mD,with XGBoost coming in second.However,when the results were visualised,it was discovered that the XGBoost model was more suitable for the problem being tackled.The XGBoost model was a better predictor within the sandstone interval,while the Extra-Trees model was more appropriate in non-sandstone intervals.Since this study aims to predict permeability in the reservoir interval,the XGBoost model is the most suitable.These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric.Given the heterogeneity of the subsurface,relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.展开更多
The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and...The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and the number of tweets regarding COVID-19 increased tremendously.Several studies used Sentiment Analysis(SA)to analyze the emotions expressed through tweets upon COVID-19.Therefore,in current study,a new Artificial Bee Colony(ABC)with Machine Learning-driven SA(ABCMLSA)model is developed for conducting Sentiment Analysis of COVID-19 Twitter data.The prime focus of the presented ABCML-SA model is to recognize the sentiments expressed in tweets made uponCOVID-19.It involves data pre-processing at the initial stage followed by n-gram based feature extraction to derive the feature vectors.For identification and classification of the sentiments,the Support Vector Machine(SVM)model is exploited.At last,the ABC algorithm is applied to fine tune the parameters involved in SVM.To demonstrate the improved performance of the proposed ABCML-SA model,a sequence of simulations was conducted.The comparative assessment results confirmed the effectual performance of the proposed ABCML-SA model over other approaches.展开更多
The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these t...The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these types of buildings have minimal consideration in the ongoing energy efficiency applications.This is due to the unpredictability in the electrical consumption of the mosques affecting the stability of the distribution networks.Therefore,this study addresses this issue by developing a framework for a short-term electricity load forecast for a mosque load located in Riyadh,Saudi Arabia.In this study,and by harvesting the load consumption of the mosque and meteorological datasets,the performance of four forecasting algorithms is investigated,namely Artificial Neural Network and Support Vector Regression(SVR)based on three kernel functions:Radial Basis(RB),Polynomial,and Linear.In addition,this research work examines the impact of 13 different combinations of input attributes since selecting the optimal features has a major influence on yielding precise forecasting outcomes.For the mosque load,the(SVR-RB)with eleven features appeared to be the best forecasting model with the lowest forecasting errors metrics giving RMSE,nRMSE,MAE,and nMAE values of 4.207 kW,2.522%,2.938 kW,and 1.761%,respectively.展开更多
Autism spectrum disorder(ASD)is regarded as a neurological disorder well-defined by a specific set of problems associated with social skills,recurrent conduct,and communication.Identifying ASD as soon as possible is f...Autism spectrum disorder(ASD)is regarded as a neurological disorder well-defined by a specific set of problems associated with social skills,recurrent conduct,and communication.Identifying ASD as soon as possible is favourable due to prior identification of ASD permits prompt interferences in children with ASD.Recognition of ASD related to objective pathogenicmutation screening is the initial step against prior intervention and efficient treatment of children who were affected.Nowadays,healthcare and machine learning(ML)industries are combined for determining the existence of various diseases.This article devises a Jellyfish Search Optimization with Deep Learning Driven ASD Detection and Classification(JSODL-ASDDC)model.The goal of the JSODL-ASDDC algorithm is to identify the different stages of ASD with the help of biomedical data.The proposed JSODLASDDC model initially performs min-max data normalization approach to scale the data into uniform range.In addition,the JSODL-ASDDC model involves JSO based feature selection(JFSO-FS)process to choose optimal feature subsets.Moreover,Gated Recurrent Unit(GRU)based classification model is utilized for the recognition and classification of ASD.Furthermore,the Bacterial Foraging Optimization(BFO)assisted parameter tuning process gets executed to enhance the efficacy of the GRU system.The experimental assessment of the JSODL-ASDDC model is investigated against distinct datasets.The experimental outcomes highlighted the enhanced performances of the JSODL-ASDDC algorithm over recent approaches.展开更多
In 2021,the abnormal short-term price fluctuations of GameStop,which were triggered by internet stock discussions,drew the attention of academics,financial analysts,and stock trading commissions alike,prompting calls ...In 2021,the abnormal short-term price fluctuations of GameStop,which were triggered by internet stock discussions,drew the attention of academics,financial analysts,and stock trading commissions alike,prompting calls to address such events and maintain market stability.However,the impact of stock discussions on volatile trading behavior has received comparatively less attention than traditional fundamentals.Furthermore,data mining methods are less often used to predict stock trading despite their higher accuracy.This study adopts an innovative approach using social media data to obtain stock rumors,and then trains three decision trees to demonstrate the impact of rumor propagation on stock trading behavior.Our findings show that rumor propagation outperforms traditional fundamentals in predicting abnormal trading behavior.The study serves as an impetus for further research using data mining as a method of inquiry.展开更多
Urgent care clinics and emergency departments around the world periodically suffer from extended wait times beyond patient expectations due to surges in patient flows.The delays arising from inadequate staffing levels...Urgent care clinics and emergency departments around the world periodically suffer from extended wait times beyond patient expectations due to surges in patient flows.The delays arising from inadequate staffing levels during these periods have been linked with adverse clinical outcomes.Previous research into forecasting patient flows has mostly used statistical techniques.These studies have also predominately focussed on short‐term forecasts,which have limited practicality for the resourcing of medical personnel.This study joins an emerging body of work which seeks to explore the potential of machine learning algorithms to generate accurate forecasts of patient presentations.Our research uses datasets covering 10 years from two large urgent care clinics to develop long‐term patient flow forecasts up to one quarter ahead using a range of state‐of‐the‐art algo-rithms.A distinctive feature of this study is the use of eXplainable Artificial Intelligence(XAI)tools like Shapely and LIME that enable an in‐depth analysis of the behaviour of the models,which would otherwise be uninterpretable.These analysis tools enabled us to explore the ability of the models to adapt to the volatility in patient demand during the COVID‐19 pandemic lockdowns and to identify the most impactful variables,resulting in valuable insights into their performance.The results showed that a novel combination of advanced univariate models like Prophet as well as gradient boosting,into an ensemble,delivered the most accurate and consistent solutions on average.This approach generated improvements in the range of 16%-30%over the existing in‐house methods for esti-mating the daily patient flows 90 days ahead.展开更多
Learning Management System(LMS)is an application software that is used in automation,delivery,administration,tracking,and reporting of courses and programs in educational sector.The LMS which exploits machine learning...Learning Management System(LMS)is an application software that is used in automation,delivery,administration,tracking,and reporting of courses and programs in educational sector.The LMS which exploits machine learning(ML)has the ability of accessing user data and exploit it for improving the learning experience.The recently developed artificial intelligence(AI)and ML models helps to accomplish effective performance monitoring for LMS.Among the different processes involved in ML based LMS,feature selection and classification processesfind beneficial.In this motivation,this study introduces Glowworm-based Feature Selection with Machine Learning Enabled Performance Monitoring(GSO-MFWELM)technique for LMS.The key objective of the proposed GSO-MFWELM technique is to effectually monitor the performance in LMS.The pro-posed GSO-MFWELM technique involves GSO-based feature selection techni-que to select the optimal features.Besides,Weighted Extreme Learning Machine(WELM)model is applied for classification process whereas the parameters involved in WELM model are optimallyfine-tuned with the help of May-fly Optimization(MFO)algorithm.The design of GSO and MFO techniques result in reduced computation complexity and improved classification performance.The presented GSO-MFWELM technique was validated for its performance against benchmark dataset and the results were inspected under several aspects.The simulation results established the supremacy of GSO-MFWELM technique over recent approaches with the maximum classification accuracy of 0.9589.展开更多
Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of...Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of these model explainers to biological data is still lacking.In this study,we comprehensively evaluated multiple explainers by interpreting pre-trained models for predicting tissue types from transcriptomic data and by identifying the top contributing genes from each sample with the greatest impacts on model prediction.To improve the reproducibility and interpretability of results generated by model explainers,we proposed a series of optimization strategies for each explainer on two different model architectures of multilayer perceptron(MLP)and convolutional neural network(CNN).We observed three groups of explainer and model architecture combinations with high reproducibility.Group II,which contains three model explainers on aggregated MLP models,identified top contributing genes in different tissues that exhibited tissue-specific manifestation and were potential cancer biomarkers.In summary,our work provides novel insights and guidance for exploring biological mechanisms using explainable machine learning models.展开更多
Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels.In this study,at first,a series of experimental te...Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels.In this study,at first,a series of experimental tests were conducted to assess the shear stress distribution in prismatic compound channels.The shear stress values around the whole wetted perimeter were measured in the compound channel with different floodplain widths also in different flow depths in subcritical and supercritical conditions.A set of,data mining and machine learning algorithms including Random Forest(RF),M5P,Random Committee,KStar and Additive Regression implemented on attained data to predict the shear stress distribution in the compound channel.Results indicated among these five models;RF method indicated the most precise results with the highest R2 value of 0.9.Finally,the most powerful data mining method which studied in this research compared with two well-known analytical models of Shiono and Knight method(SKM)and Shannon method to acquire the proposed model functioning in predicting the shear stress distribution.The results showed that the RF model has the best prediction performance compared to SKM and Shannon models.展开更多
Some human diseases are recognized through of each type of White Blood Cell(WBC)count,so detecting and classifying each type is important for human healthcare.The main aim of this paper is to propose a computer-aided ...Some human diseases are recognized through of each type of White Blood Cell(WBC)count,so detecting and classifying each type is important for human healthcare.The main aim of this paper is to propose a computer-aided WBCs utility analysis tool designed,developed,and evaluated to classify WBCs into five types namely neutrophils,eosinophils,lymphocytes,monocytes,and basophils.Using a computer-artificial model reduces resource and time consumption.Various pre-trained deep learning models have been used to extract features,including AlexNet,Visual Geometry Group(VGG),Residual Network(ResNet),which belong to different taxonomy types of deep learning architectures.Also,Binary Border Collie Optimization(BBCO)is introduced as an updated version of Border Collie Optimization(BCO)for feature reduction based on maximizing classification accuracy.The proposed computer aid diagnosis tool merges transfer deep learning ResNet101,BBCO feature reduction,and Support Vector Machine(SVM)classifier to forma hybridmodelResNet101-BBCO-SVM an accurate and fast model for classifying WBCs.As a result,the ResNet101-BBCO-SVM scores the best accuracy at 99.21%,compared to recent studies in the benchmark.The model showed that the addition of the BBCO algorithm increased the detection accuracy,and at the same time,decreased the classification time consumption.The effectiveness of the ResNet101-BBCO-SVM model has been demonstrated and beaten in reasonable ratios in recent literary studies and end-to-end transfer learning of pre-trained models.展开更多
Modeling and optimization is crucial to smart chemical process operations.However,a large number of nonlinearities must be considered in a typical chemical process according to complex unit operations,chemical reactio...Modeling and optimization is crucial to smart chemical process operations.However,a large number of nonlinearities must be considered in a typical chemical process according to complex unit operations,chemical reactions and separations.This leads to a great challenge of implementing mechanistic models into industrial-scale problems due to the resulting computational complexity.Thus,this paper presents an efficient hybrid framework of integrating machine learning and particle swarm optimization to overcome the aforementioned difficulties.An industrial propane dehydrogenation process was carried out to demonstrate the validity and efficiency of our method.Firstly,a data set was generated based on process mechanistic simulation validated by industrial data,which provides sufficient and reasonable samples for model training and testing.Secondly,four well-known machine learning methods,namely,K-nearest neighbors,decision tree,support vector machine,and artificial neural network,were compared and used to obtain the prediction models of the processes operation.All of these methods achieved highly accurate model by adjusting model parameters on the basis of high-coverage data and properly features.Finally,optimal process operations were obtained by using the particle swarm optimization approach.展开更多
Solar energy has become crucial in producing electrical energy because it is inexhaustible and sustainable.However,its uncertain generation causes problems in power system operation.Therefore,solar irradiance forecast...Solar energy has become crucial in producing electrical energy because it is inexhaustible and sustainable.However,its uncertain generation causes problems in power system operation.Therefore,solar irradiance forecasting is significant for suitable controlling power system operation,organizing the transmission expansion planning,and dispatching power system generation.Nonetheless,the forecasting performance can be decreased due to the unfitted prediction model and lacked preprocessing.To deal with mentioned issues,this paper pro-poses Meta-Learning Extreme Learning Machine optimized with Golden Eagle Optimization and Logistic Map(MGEL-ELM)and the Same Datetime Interval Averaged Imputation algorithm(SAME)for improving the fore-casting performance of incomplete solar irradiance time series datasets.Thus,the proposed method is not only imputing incomplete forecasting data but also achieving forecasting accuracy.The experimental result of fore-casting solar irradiance dataset in Thailand indicates that the proposed method can achieve the highest coeffi-cient of determination value up to 0.9307 compared to state-of-the-art models.Furthermore,the proposed method consumes less forecasting time than the deep learning model.展开更多
Alloys designed with the traditional trial and error method have encountered several problems,such as long trial cycles and high costs.The rapid development of big data and artificial intelligence provides a new path ...Alloys designed with the traditional trial and error method have encountered several problems,such as long trial cycles and high costs.The rapid development of big data and artificial intelligence provides a new path for the efficient development of metallic materials,that is,machine learning-assisted design.In this paper,the basic strategy for the machine learning-assisted rational design of alloys was introduced.Research progress in the property-oriented reversal design of alloy composition,the screening design of alloy composition based on models established using element physical and chemical features or microstructure factors,and the optimal design of alloy composition and process parameters based on iterative feedback optimization was reviewed.Results showed the great advantages of machine learning,including high efficiency and low cost.Future development trends for the machine learning-assisted rational design of alloys were also discussed.Interpretable modeling,integrated modeling,high-throughput combination,multi-objective optimization,and innovative platform building were suggested as fields of great interest.展开更多
文摘Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the need for effective risk prediction models. Machine learning (ML) techniques have shown promise in analyzing complex data patterns and predicting disease outcomes. The accuracy of these techniques is greatly affected by changing their parameters. Hyperparameter optimization plays a crucial role in improving model performance. In this work, the Particle Swarm Optimization (PSO) algorithm was used to effectively search the hyperparameter space and improve the predictive power of the machine learning models by identifying the optimal hyperparameters that can provide the highest accuracy. A dataset with a variety of clinical and epidemiological characteristics linked to COVID-19 cases was used in this study. Various machine learning models, including Random Forests, Decision Trees, Support Vector Machines, and Neural Networks, were utilized to capture the complex relationships present in the data. To evaluate the predictive performance of the models, the accuracy metric was employed. The experimental findings showed that the suggested method of estimating COVID-19 risk is effective. When compared to baseline models, the optimized machine learning models performed better and produced better results.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
基金This research was partly supported by the Technology Development Program of MSS[No.S3033853]by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A4A1031509).
文摘Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/158/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R235)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4340237DSR06).
文摘The biomedical data classification process has received significant attention in recent times due to a massive increase in the generation of healthcare data from various sources.The developments of artificial intelligence(AI)and machine learning(ML)models assist in the effectual design of medical data classification models.Therefore,this article concentrates on the development of optimal Stacked Long Short Term Memory Sequence-toSequence Autoencoder(OSAE-LSTM)model for biomedical data classification.The presented OSAE-LSTM model intends to classify the biomedical data for the existence of diseases.Primarily,the OSAE-LSTM model involves min-max normalization based pre-processing to scale the data into uniform format.Followed by,the SAE-LSTM model is utilized for the detection and classification of diseases in biomedical data.At last,manta ray foraging optimization(MRFO)algorithm has been employed for hyperparameter optimization process.The utilization of MRFO algorithm assists in optimal selection of hypermeters involved in the SAE-LSTM model.The simulation analysis of the OSAE-LSTM model has been tested using a set of benchmark medical datasets and the results reported the improvements of the OSAELSTM model over the other approaches under several dimensions.
文摘In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source and heterogeneous data distributed storage, in order to achieve the sharing, we must solve from the storage management to the interoperability of a series of mechanism, the method and implementation technology. Unstructured data does not have strict structure, therefore, compared with structured information that is more difficult to standardization, with management more difficult. According to these characteristics, the large capacity of unstructured data or using files separately store, is stored in the database index of similar pointer. Under this background, we propose the new idea on the structured data mining algorithm that is meaningful.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR36The authors are thankful to the Deanship of Scientific Research at Najran University for funding thiswork under theResearch Groups Funding program grant code(NU/RG/SERC/11/7).
文摘Arabic is one of the most spoken languages across the globe.However,there are fewer studies concerning Sentiment Analysis(SA)in Arabic.In recent years,the detected sentiments and emotions expressed in tweets have received significant interest.The substantial role played by the Arab region in international politics and the global economy has urged the need to examine the sentiments and emotions in the Arabic language.Two common models are available:Machine Learning and lexicon-based approaches to address emotion classification problems.With this motivation,the current research article develops a Teaching and Learning Optimization with Machine Learning Based Emotion Recognition and Classification(TLBOML-ERC)model for Sentiment Analysis on tweets made in the Arabic language.The presented TLBOML-ERC model focuses on recognising emotions and sentiments expressed in Arabic tweets.To attain this,the proposed TLBOMLERC model initially carries out data pre-processing and a Continuous Bag Of Words(CBOW)-based word embedding process.In addition,Denoising Autoencoder(DAE)model is also exploited to categorise different emotions expressed in Arabic tweets.To improve the efficacy of the DAE model,the Teaching and Learning-based Optimization(TLBO)algorithm is utilized to optimize the parameters.The proposed TLBOML-ERC method was experimentally validated with the help of an Arabic tweets dataset.The obtained results show the promising performance of the proposed TLBOML-ERC model on Arabic emotion classification.
文摘Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is still challenging for most researchers.Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach.This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field,wireline logs were used to train Extra-Trees,Ridge,Bagging,and XGBoost models.The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models.Based on statistical metrics,the Extra-Trees model achieved the highest test accuracy of 0.996,RMSE of 19.54 mD,and MAE of 3.18 mD,with XGBoost coming in second.However,when the results were visualised,it was discovered that the XGBoost model was more suitable for the problem being tackled.The XGBoost model was a better predictor within the sandstone interval,while the Extra-Trees model was more appropriate in non-sandstone intervals.Since this study aims to predict permeability in the reservoir interval,the XGBoost model is the most suitable.These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric.Given the heterogeneity of the subsurface,relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.
基金The Deanship of ScientificResearch (DSR)at King Abdulaziz University,Jeddah,Saudi Arabia has funded this project,under Grant No. (FP-205-43).
文摘The outbreak of the pandemic,caused by Coronavirus Disease 2019(COVID-19),has affected the daily activities of people across the globe.During COVID-19 outbreak and the successive lockdowns,Twitter was heavily used and the number of tweets regarding COVID-19 increased tremendously.Several studies used Sentiment Analysis(SA)to analyze the emotions expressed through tweets upon COVID-19.Therefore,in current study,a new Artificial Bee Colony(ABC)with Machine Learning-driven SA(ABCMLSA)model is developed for conducting Sentiment Analysis of COVID-19 Twitter data.The prime focus of the presented ABCML-SA model is to recognize the sentiments expressed in tweets made uponCOVID-19.It involves data pre-processing at the initial stage followed by n-gram based feature extraction to derive the feature vectors.For identification and classification of the sentiments,the Support Vector Machine(SVM)model is exploited.At last,the ABC algorithm is applied to fine tune the parameters involved in SVM.To demonstrate the improved performance of the proposed ABCML-SA model,a sequence of simulations was conducted.The comparative assessment results confirmed the effectual performance of the proposed ABCML-SA model over other approaches.
基金The author extends his appreciation to the Deputyship for Research&Innovation,Ministry of Education and Qassim University,Saudi Arabia for funding this research work through the Project Number(QU-IF-4-3-3-30013).
文摘The tendency toward achieving more sustainable and green buildings turned several passive buildings into more dynamic ones.Mosques are the type of buildings that have a unique energy usage pattern.Nevertheless,these types of buildings have minimal consideration in the ongoing energy efficiency applications.This is due to the unpredictability in the electrical consumption of the mosques affecting the stability of the distribution networks.Therefore,this study addresses this issue by developing a framework for a short-term electricity load forecast for a mosque load located in Riyadh,Saudi Arabia.In this study,and by harvesting the load consumption of the mosque and meteorological datasets,the performance of four forecasting algorithms is investigated,namely Artificial Neural Network and Support Vector Regression(SVR)based on three kernel functions:Radial Basis(RB),Polynomial,and Linear.In addition,this research work examines the impact of 13 different combinations of input attributes since selecting the optimal features has a major influence on yielding precise forecasting outcomes.For the mosque load,the(SVR-RB)with eleven features appeared to be the best forecasting model with the lowest forecasting errors metrics giving RMSE,nRMSE,MAE,and nMAE values of 4.207 kW,2.522%,2.938 kW,and 1.761%,respectively.
文摘Autism spectrum disorder(ASD)is regarded as a neurological disorder well-defined by a specific set of problems associated with social skills,recurrent conduct,and communication.Identifying ASD as soon as possible is favourable due to prior identification of ASD permits prompt interferences in children with ASD.Recognition of ASD related to objective pathogenicmutation screening is the initial step against prior intervention and efficient treatment of children who were affected.Nowadays,healthcare and machine learning(ML)industries are combined for determining the existence of various diseases.This article devises a Jellyfish Search Optimization with Deep Learning Driven ASD Detection and Classification(JSODL-ASDDC)model.The goal of the JSODL-ASDDC algorithm is to identify the different stages of ASD with the help of biomedical data.The proposed JSODLASDDC model initially performs min-max data normalization approach to scale the data into uniform range.In addition,the JSODL-ASDDC model involves JSO based feature selection(JFSO-FS)process to choose optimal feature subsets.Moreover,Gated Recurrent Unit(GRU)based classification model is utilized for the recognition and classification of ASD.Furthermore,the Bacterial Foraging Optimization(BFO)assisted parameter tuning process gets executed to enhance the efficacy of the GRU system.The experimental assessment of the JSODL-ASDDC model is investigated against distinct datasets.The experimental outcomes highlighted the enhanced performances of the JSODL-ASDDC algorithm over recent approaches.
基金supported by the National Science and Technology Council,Taiwan,under grants MOST 108-2410-H-027-020,MOST 109-2410-H-027-009-MY2 and MOST 111-2410-H-027-011-MY3.
文摘In 2021,the abnormal short-term price fluctuations of GameStop,which were triggered by internet stock discussions,drew the attention of academics,financial analysts,and stock trading commissions alike,prompting calls to address such events and maintain market stability.However,the impact of stock discussions on volatile trading behavior has received comparatively less attention than traditional fundamentals.Furthermore,data mining methods are less often used to predict stock trading despite their higher accuracy.This study adopts an innovative approach using social media data to obtain stock rumors,and then trains three decision trees to demonstrate the impact of rumor propagation on stock trading behavior.Our findings show that rumor propagation outperforms traditional fundamentals in predicting abnormal trading behavior.The study serves as an impetus for further research using data mining as a method of inquiry.
文摘Urgent care clinics and emergency departments around the world periodically suffer from extended wait times beyond patient expectations due to surges in patient flows.The delays arising from inadequate staffing levels during these periods have been linked with adverse clinical outcomes.Previous research into forecasting patient flows has mostly used statistical techniques.These studies have also predominately focussed on short‐term forecasts,which have limited practicality for the resourcing of medical personnel.This study joins an emerging body of work which seeks to explore the potential of machine learning algorithms to generate accurate forecasts of patient presentations.Our research uses datasets covering 10 years from two large urgent care clinics to develop long‐term patient flow forecasts up to one quarter ahead using a range of state‐of‐the‐art algo-rithms.A distinctive feature of this study is the use of eXplainable Artificial Intelligence(XAI)tools like Shapely and LIME that enable an in‐depth analysis of the behaviour of the models,which would otherwise be uninterpretable.These analysis tools enabled us to explore the ability of the models to adapt to the volatility in patient demand during the COVID‐19 pandemic lockdowns and to identify the most impactful variables,resulting in valuable insights into their performance.The results showed that a novel combination of advanced univariate models like Prophet as well as gradient boosting,into an ensemble,delivered the most accurate and consistent solutions on average.This approach generated improvements in the range of 16%-30%over the existing in‐house methods for esti-mating the daily patient flows 90 days ahead.
基金supported by the Researchers Supporting Program(TUMA-Project2021-27)Almaarefa University,RiyadhSaudi Arabia.Taif University Researchers Supporting Project number(TURSP-2020/161)Taif University,Taif,Saudi Arabia.
文摘Learning Management System(LMS)is an application software that is used in automation,delivery,administration,tracking,and reporting of courses and programs in educational sector.The LMS which exploits machine learning(ML)has the ability of accessing user data and exploit it for improving the learning experience.The recently developed artificial intelligence(AI)and ML models helps to accomplish effective performance monitoring for LMS.Among the different processes involved in ML based LMS,feature selection and classification processesfind beneficial.In this motivation,this study introduces Glowworm-based Feature Selection with Machine Learning Enabled Performance Monitoring(GSO-MFWELM)technique for LMS.The key objective of the proposed GSO-MFWELM technique is to effectually monitor the performance in LMS.The pro-posed GSO-MFWELM technique involves GSO-based feature selection techni-que to select the optimal features.Besides,Weighted Extreme Learning Machine(WELM)model is applied for classification process whereas the parameters involved in WELM model are optimallyfine-tuned with the help of May-fly Optimization(MFO)algorithm.The design of GSO and MFO techniques result in reduced computation complexity and improved classification performance.The presented GSO-MFWELM technique was validated for its performance against benchmark dataset and the results were inspected under several aspects.The simulation results established the supremacy of GSO-MFWELM technique over recent approaches with the maximum classification accuracy of 0.9589.
文摘Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of these model explainers to biological data is still lacking.In this study,we comprehensively evaluated multiple explainers by interpreting pre-trained models for predicting tissue types from transcriptomic data and by identifying the top contributing genes from each sample with the greatest impacts on model prediction.To improve the reproducibility and interpretability of results generated by model explainers,we proposed a series of optimization strategies for each explainer on two different model architectures of multilayer perceptron(MLP)and convolutional neural network(CNN).We observed three groups of explainer and model architecture combinations with high reproducibility.Group II,which contains three model explainers on aggregated MLP models,identified top contributing genes in different tissues that exhibited tissue-specific manifestation and were potential cancer biomarkers.In summary,our work provides novel insights and guidance for exploring biological mechanisms using explainable machine learning models.
文摘Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels.In this study,at first,a series of experimental tests were conducted to assess the shear stress distribution in prismatic compound channels.The shear stress values around the whole wetted perimeter were measured in the compound channel with different floodplain widths also in different flow depths in subcritical and supercritical conditions.A set of,data mining and machine learning algorithms including Random Forest(RF),M5P,Random Committee,KStar and Additive Regression implemented on attained data to predict the shear stress distribution in the compound channel.Results indicated among these five models;RF method indicated the most precise results with the highest R2 value of 0.9.Finally,the most powerful data mining method which studied in this research compared with two well-known analytical models of Shiono and Knight method(SKM)and Shannon method to acquire the proposed model functioning in predicting the shear stress distribution.The results showed that the RF model has the best prediction performance compared to SKM and Shannon models.
文摘Some human diseases are recognized through of each type of White Blood Cell(WBC)count,so detecting and classifying each type is important for human healthcare.The main aim of this paper is to propose a computer-aided WBCs utility analysis tool designed,developed,and evaluated to classify WBCs into five types namely neutrophils,eosinophils,lymphocytes,monocytes,and basophils.Using a computer-artificial model reduces resource and time consumption.Various pre-trained deep learning models have been used to extract features,including AlexNet,Visual Geometry Group(VGG),Residual Network(ResNet),which belong to different taxonomy types of deep learning architectures.Also,Binary Border Collie Optimization(BBCO)is introduced as an updated version of Border Collie Optimization(BCO)for feature reduction based on maximizing classification accuracy.The proposed computer aid diagnosis tool merges transfer deep learning ResNet101,BBCO feature reduction,and Support Vector Machine(SVM)classifier to forma hybridmodelResNet101-BBCO-SVM an accurate and fast model for classifying WBCs.As a result,the ResNet101-BBCO-SVM scores the best accuracy at 99.21%,compared to recent studies in the benchmark.The model showed that the addition of the BBCO algorithm increased the detection accuracy,and at the same time,decreased the classification time consumption.The effectiveness of the ResNet101-BBCO-SVM model has been demonstrated and beaten in reasonable ratios in recent literary studies and end-to-end transfer learning of pre-trained models.
基金This work was supported by the“Zhujiang Talent Program”High Talent Project of Guangdong Province(Grant No.2017GC010614)the National Natural Science Foundation of China(Grant No.22078372).
文摘Modeling and optimization is crucial to smart chemical process operations.However,a large number of nonlinearities must be considered in a typical chemical process according to complex unit operations,chemical reactions and separations.This leads to a great challenge of implementing mechanistic models into industrial-scale problems due to the resulting computational complexity.Thus,this paper presents an efficient hybrid framework of integrating machine learning and particle swarm optimization to overcome the aforementioned difficulties.An industrial propane dehydrogenation process was carried out to demonstrate the validity and efficiency of our method.Firstly,a data set was generated based on process mechanistic simulation validated by industrial data,which provides sufficient and reasonable samples for model training and testing.Secondly,four well-known machine learning methods,namely,K-nearest neighbors,decision tree,support vector machine,and artificial neural network,were compared and used to obtain the prediction models of the processes operation.All of these methods achieved highly accurate model by adjusting model parameters on the basis of high-coverage data and properly features.Finally,optimal process operations were obtained by using the particle swarm optimization approach.
文摘Solar energy has become crucial in producing electrical energy because it is inexhaustible and sustainable.However,its uncertain generation causes problems in power system operation.Therefore,solar irradiance forecasting is significant for suitable controlling power system operation,organizing the transmission expansion planning,and dispatching power system generation.Nonetheless,the forecasting performance can be decreased due to the unfitted prediction model and lacked preprocessing.To deal with mentioned issues,this paper pro-poses Meta-Learning Extreme Learning Machine optimized with Golden Eagle Optimization and Logistic Map(MGEL-ELM)and the Same Datetime Interval Averaged Imputation algorithm(SAME)for improving the fore-casting performance of incomplete solar irradiance time series datasets.Thus,the proposed method is not only imputing incomplete forecasting data but also achieving forecasting accuracy.The experimental result of fore-casting solar irradiance dataset in Thailand indicates that the proposed method can achieve the highest coeffi-cient of determination value up to 0.9307 compared to state-of-the-art models.Furthermore,the proposed method consumes less forecasting time than the deep learning model.
基金financially supported by the National Key Research and Development Program of China(No.2021YFB3803101)National Natural Science Foundation of China(Nos.51974028 and 52022011)the Beijing Municipal Science and Technology Commission(No.Z191100001119125)。
文摘Alloys designed with the traditional trial and error method have encountered several problems,such as long trial cycles and high costs.The rapid development of big data and artificial intelligence provides a new path for the efficient development of metallic materials,that is,machine learning-assisted design.In this paper,the basic strategy for the machine learning-assisted rational design of alloys was introduced.Research progress in the property-oriented reversal design of alloy composition,the screening design of alloy composition based on models established using element physical and chemical features or microstructure factors,and the optimal design of alloy composition and process parameters based on iterative feedback optimization was reviewed.Results showed the great advantages of machine learning,including high efficiency and low cost.Future development trends for the machine learning-assisted rational design of alloys were also discussed.Interpretable modeling,integrated modeling,high-throughput combination,multi-objective optimization,and innovative platform building were suggested as fields of great interest.