With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning ...With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.展开更多
Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data ...Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).展开更多
The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge...The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of m...The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.展开更多
Recently,Industrial Control Systems(ICSs)have been changing from a closed environment to an open environment because of the expansion of digital transformation,smart factories,and Industrial Internet of Things(IIoT).S...Recently,Industrial Control Systems(ICSs)have been changing from a closed environment to an open environment because of the expansion of digital transformation,smart factories,and Industrial Internet of Things(IIoT).Since security accidents that occur in ICSs can cause national confusion and human casualties,research on detecting abnormalities by using normal operation data learning is being actively conducted.The single technique proposed by existing studies does not detect abnormalities well or provide satisfactory results.In this paper,we propose a GRU-based Buzzer Ensemble for AbnormalDetection(GBE-AD)model for detecting anomalies in industrial control systems to ensure rapid response and process availability.The newly proposed ensemble model of the buzzer method resolves False Negatives(FNs)by complementing the limited range that can be detected in a single model because of the internal models composing GBE-AD.Because the internal models remain suppressed for False Positives(FPs),GBE-AD provides better generalization.In addition,we generated mean prediction error data in GBE-AD and inferred abnormal processes using soft and hard clustering.We confirmed that the detection model’s Time-series Aware Precision(TaP)suppressed FPs at 97.67%.The final performance was 94.04%in an experiment using anHIL-basedAugmented ICS(HAI)Security Dataset(ver.21.03)among public datasets.展开更多
The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this st...The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this study,we investigate the use of first-person photos for the prediction of air quality.The main idea is to harness the power of a generalized stacking approach and the importance of haze features extracted from first-person images to create an efficient new stacking model called AirStackNet for air pollution prediction.AirStackNet consists of two layers and four regression models,where the first layer generates meta-data fromLight Gradient Boosting Machine(Light-GBM),Extreme Gradient Boosting Regression(XGBoost)and CatBoost Regression(CatBoost),whereas the second layer computes the final prediction from the meta-data of the first layer using Extra Tree Regression(ET).The performance of the proposed AirStackNet model is validated using public Personal Air Quality Dataset(PAQD).Our experiments are evaluated using Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Coefficient of Determination(R2),Mean Squared Error(MSE),Root Mean Squared Logarithmic Error(RMSLE),and Mean Absolute Percentage Error(MAPE).Experimental Results indicate that the proposed AirStackNet model not only can effectively improve air pollution prediction performance by overcoming the Bias-Variance tradeoff,but also outperforms baseline and state of the art models.展开更多
No optimization algorithm can obtain satisfactory results in all optimization tasks.Thus,it is an effective way to deal with the problem by an ensemble ofmultiple algorithms.This paper proposes an ensemble of populati...No optimization algorithm can obtain satisfactory results in all optimization tasks.Thus,it is an effective way to deal with the problem by an ensemble ofmultiple algorithms.This paper proposes an ensemble of population-based metaheuristics(EPM)to solve single-objective optimization problems.The design of the EPM framework includes three stages:the initial stage,the update stage,and the final stage.The framework applies the transformation of the real and virtual population to balance the problem of exploration and exploitation at the population level and uses an elite strategy to communicate among virtual populations.The experiment tested two benchmark function sets with fivemetaheuristic algorithms and four ensemble algorithms.The ensemble algorithms are generally superior to the original algorithms by Friedman’s average ranking andWilcoxon signed ranking test results,demonstrating the ensemble framework’s effect.By solving the iterative curves of different test functions,we can see that the ensemble algorithms have faster iterative optimization speed and better optimization results.The ensemble algorithms cannot fall into local optimumby virtual populations distribution map of several stages.The ensemble framework performs well from the effects of solving two practical engineering problems.Some results of ensemble algorithms are superior to those of metaheuristic algorithms not included in the ensemble framework,further demonstrating the ensemble method’s potential and superiority.展开更多
An automated retinal disease detection system has long been in exis-tence and it provides a safe,no-contact and cost-effective solution for detecting this disease.This paper presents a game theory-based dynamic weight...An automated retinal disease detection system has long been in exis-tence and it provides a safe,no-contact and cost-effective solution for detecting this disease.This paper presents a game theory-based dynamic weighted ensem-ble of a feature extraction-based machine learning model and a deep transfer learning model for automatic retinal disease detection.The feature extraction-based machine learning model uses Gaussian kernel-based fuzzy rough sets for reduction of features,and XGBoost classifier for the classification.The transfer learning model uses VGG16 or ResNet50 or Inception-ResNet-v2.A novel ensemble classifier based on the game theory approach is proposed for the fusion of the outputs of the transfer learning model and the XGBoost classifier model.The ensemble approach significantly improves the accuracy of retinal disease pre-diction and results in an excellent performance when compared to the individual deep learning and feature-based models.展开更多
Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
As the COVID-19 pandemic swept the globe,social media plat-forms became an essential source of information and communication for many.International students,particularly,turned to Twitter to express their struggles an...As the COVID-19 pandemic swept the globe,social media plat-forms became an essential source of information and communication for many.International students,particularly,turned to Twitter to express their struggles and hardships during this difficult time.To better understand the sentiments and experiences of these international students,we developed the Situational Aspect-Based Annotation and Classification(SABAC)text mining framework.This framework uses a three-layer approach,combining baseline Deep Learning(DL)models with Machine Learning(ML)models as meta-classifiers to accurately predict the sentiments and aspects expressed in tweets from our collected Student-COVID-19 dataset.Using the pro-posed aspect2class annotation algorithm,we labeled bulk unlabeled tweets according to their contained aspect terms.However,we also recognized the challenges of reducing data’s high dimensionality and sparsity to improve performance and annotation on unlabeled datasets.To address this issue,we proposed the Volatile Stopwords Filtering(VSF)technique to reduce sparsity and enhance classifier performance.The resulting Student-COVID Twitter dataset achieved a sophisticated accuracy of 93.21%when using the random forest as a meta-classifier.Through testing on three benchmark datasets,we found that the SABAC ensemble framework performed exceptionally well.Our findings showed that international students during the pandemic faced various issues,including stress,uncertainty,health concerns,financial stress,and difficulties with online classes and returning to school.By analyzing and summarizing these annotated tweets,decision-makers can better understand and address the real-time problems international students face during the ongoing pandemic.展开更多
Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many pati...Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a rescue.The input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’experience.Therefore,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest x-rays.In medical classifications,deep convolution neural networks are commonly used.This research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and Pneumonia.The MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR images.The evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and f1-score.The model effectively decreases training loss while increasing accuracy.The findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.展开更多
Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantil...Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.展开更多
Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosec...Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosecurity strategies,especially given the impact of climate change on pest species invasion and distribution.Machine learning(ML),specifically ensemble models,has emerged as a powerful tool in predicting species distributions due to its ability to learn and make predictions based on complex data sets.Thus,this research utilised advanced machine learning techniques to predict the distribution of PCN species under climate change conditions,providing the initial element for invasion risk assessment.We first used Global Climate Models to generate homogeneous climate predictors to mitigate the variation among predictors.Then,five machine learning models were employed to build two groups of ensembles,single-algorithm ensembles(ESA)and multi-algorithm ensembles(EMA),and compared their performances.In this research,the EMA did not always perform better than the ESA,and the ESA of Artificial Neural Network gave the highest performance while being cost-effective.Prediction results indicated that the distribution range of PCNs would shift northward with a decrease in tropical zones and an increase in northern latitudes.However,the total area of suitable regions will not change significantly,occupying 16-20%of the total land surface(18%under current conditions).This research alerts policymakers and practitioners to the risk of PCNs’incursion into new regions.Additionally,this ML process offers the capability to track changes in the distribution of various species and provides scientifically grounded evidence for formulating long-term biosecurity plans for their control.展开更多
Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and m...Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and multi-manifold projections(MMP)models,and then combines the multiple solutions within an ensemble result through Bayesian inference.In the developed ESA model,different structure features of the given dataset are taken into account simultaneously,the suitability and reliability of the ESA-based monitoring model are then illustrated through comparison.Introduction:The requirement for ensuring safe operation and improving process efficiency has led to increased research activity in the field of process monitoring.展开更多
Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective dia...Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.展开更多
We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory...We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory entropies are studied in two typical potentials, i.e., harmonic potential and double-well potential, and in viscous environment by interacting trajectory method. The results of the trajectory methods are in agreement well with the numerical methods(Monte Carlo simulation and difference equation). The single-trajectory entropies increasing(decreasing) could be caused by absorption(emission) heat from(to) the thermal environment. Also, some interesting trajectories, which correspond to the rare evens in the processes, are demonstrated.展开更多
A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble...A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.展开更多
文摘With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.
文摘Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).
文摘The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
基金This work is supported,in part,by the National Natural Science Foundation of China Grant No.62102190 and 62272236in part,by the Natural Science Foundation of Jiangsu Province under Grant No.BK20201136 and BK20191401.
文摘The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.
基金supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by Korea government Ministry of Science,ICT(MSIT)(No.2019-0-01343,convergence security core talent training business).
文摘Recently,Industrial Control Systems(ICSs)have been changing from a closed environment to an open environment because of the expansion of digital transformation,smart factories,and Industrial Internet of Things(IIoT).Since security accidents that occur in ICSs can cause national confusion and human casualties,research on detecting abnormalities by using normal operation data learning is being actively conducted.The single technique proposed by existing studies does not detect abnormalities well or provide satisfactory results.In this paper,we propose a GRU-based Buzzer Ensemble for AbnormalDetection(GBE-AD)model for detecting anomalies in industrial control systems to ensure rapid response and process availability.The newly proposed ensemble model of the buzzer method resolves False Negatives(FNs)by complementing the limited range that can be detected in a single model because of the internal models composing GBE-AD.Because the internal models remain suppressed for False Positives(FPs),GBE-AD provides better generalization.In addition,we generated mean prediction error data in GBE-AD and inferred abnormal processes using soft and hard clustering.We confirmed that the detection model’s Time-series Aware Precision(TaP)suppressed FPs at 97.67%.The final performance was 94.04%in an experiment using anHIL-basedAugmented ICS(HAI)Security Dataset(ver.21.03)among public datasets.
基金the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research through project number PNU-DRI-RI-20-033.
文摘The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this study,we investigate the use of first-person photos for the prediction of air quality.The main idea is to harness the power of a generalized stacking approach and the importance of haze features extracted from first-person images to create an efficient new stacking model called AirStackNet for air pollution prediction.AirStackNet consists of two layers and four regression models,where the first layer generates meta-data fromLight Gradient Boosting Machine(Light-GBM),Extreme Gradient Boosting Regression(XGBoost)and CatBoost Regression(CatBoost),whereas the second layer computes the final prediction from the meta-data of the first layer using Extra Tree Regression(ET).The performance of the proposed AirStackNet model is validated using public Personal Air Quality Dataset(PAQD).Our experiments are evaluated using Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Coefficient of Determination(R2),Mean Squared Error(MSE),Root Mean Squared Logarithmic Error(RMSLE),and Mean Absolute Percentage Error(MAPE).Experimental Results indicate that the proposed AirStackNet model not only can effectively improve air pollution prediction performance by overcoming the Bias-Variance tradeoff,but also outperforms baseline and state of the art models.
基金supported by National Natural Science Foundation of China under Grant 62073330.The auther J.T.received the grant。
文摘No optimization algorithm can obtain satisfactory results in all optimization tasks.Thus,it is an effective way to deal with the problem by an ensemble ofmultiple algorithms.This paper proposes an ensemble of population-based metaheuristics(EPM)to solve single-objective optimization problems.The design of the EPM framework includes three stages:the initial stage,the update stage,and the final stage.The framework applies the transformation of the real and virtual population to balance the problem of exploration and exploitation at the population level and uses an elite strategy to communicate among virtual populations.The experiment tested two benchmark function sets with fivemetaheuristic algorithms and four ensemble algorithms.The ensemble algorithms are generally superior to the original algorithms by Friedman’s average ranking andWilcoxon signed ranking test results,demonstrating the ensemble framework’s effect.By solving the iterative curves of different test functions,we can see that the ensemble algorithms have faster iterative optimization speed and better optimization results.The ensemble algorithms cannot fall into local optimumby virtual populations distribution map of several stages.The ensemble framework performs well from the effects of solving two practical engineering problems.Some results of ensemble algorithms are superior to those of metaheuristic algorithms not included in the ensemble framework,further demonstrating the ensemble method’s potential and superiority.
文摘An automated retinal disease detection system has long been in exis-tence and it provides a safe,no-contact and cost-effective solution for detecting this disease.This paper presents a game theory-based dynamic weighted ensem-ble of a feature extraction-based machine learning model and a deep transfer learning model for automatic retinal disease detection.The feature extraction-based machine learning model uses Gaussian kernel-based fuzzy rough sets for reduction of features,and XGBoost classifier for the classification.The transfer learning model uses VGG16 or ResNet50 or Inception-ResNet-v2.A novel ensemble classifier based on the game theory approach is proposed for the fusion of the outputs of the transfer learning model and the XGBoost classifier model.The ensemble approach significantly improves the accuracy of retinal disease pre-diction and results in an excellent performance when compared to the individual deep learning and feature-based models.
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.
基金supported by the National Natural Science Foundation of China[Grant Number:92067106]the Ministry of Education of the People’s Republic of China[Grant Number:E-GCCRC20200309].
文摘As the COVID-19 pandemic swept the globe,social media plat-forms became an essential source of information and communication for many.International students,particularly,turned to Twitter to express their struggles and hardships during this difficult time.To better understand the sentiments and experiences of these international students,we developed the Situational Aspect-Based Annotation and Classification(SABAC)text mining framework.This framework uses a three-layer approach,combining baseline Deep Learning(DL)models with Machine Learning(ML)models as meta-classifiers to accurately predict the sentiments and aspects expressed in tweets from our collected Student-COVID-19 dataset.Using the pro-posed aspect2class annotation algorithm,we labeled bulk unlabeled tweets according to their contained aspect terms.However,we also recognized the challenges of reducing data’s high dimensionality and sparsity to improve performance and annotation on unlabeled datasets.To address this issue,we proposed the Volatile Stopwords Filtering(VSF)technique to reduce sparsity and enhance classifier performance.The resulting Student-COVID Twitter dataset achieved a sophisticated accuracy of 93.21%when using the random forest as a meta-classifier.Through testing on three benchmark datasets,we found that the SABAC ensemble framework performed exceptionally well.Our findings showed that international students during the pandemic faced various issues,including stress,uncertainty,health concerns,financial stress,and difficulties with online classes and returning to school.By analyzing and summarizing these annotated tweets,decision-makers can better understand and address the real-time problems international students face during the ongoing pandemic.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2021R1I1A1A01052299).
文摘Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is crucial.Medical physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a rescue.The input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’experience.Therefore,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest x-rays.In medical classifications,deep convolution neural networks are commonly used.This research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and Pneumonia.The MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR images.The evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and f1-score.The model effectively decreases training loss while increasing accuracy.The findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.
基金supported by the National Natural Science Foundation of China (Project No.42375192)the China Meteorological Administration Climate Change Special Program (CMA-CCSP+1 种基金Project No.QBZ202315)support by the Vector Stiftung through the Young Investigator Group"Artificial Intelligence for Probabilistic Weather Forecasting."
文摘Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.
基金funded by the National Key R&D Program of China(2021YFD1400200)the Taishan Scholar Constructive Engineering Foundation of Shandong,China(tstp20221135)。
文摘Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosecurity strategies,especially given the impact of climate change on pest species invasion and distribution.Machine learning(ML),specifically ensemble models,has emerged as a powerful tool in predicting species distributions due to its ability to learn and make predictions based on complex data sets.Thus,this research utilised advanced machine learning techniques to predict the distribution of PCN species under climate change conditions,providing the initial element for invasion risk assessment.We first used Global Climate Models to generate homogeneous climate predictors to mitigate the variation among predictors.Then,five machine learning models were employed to build two groups of ensembles,single-algorithm ensembles(ESA)and multi-algorithm ensembles(EMA),and compared their performances.In this research,the EMA did not always perform better than the ESA,and the ESA of Artificial Neural Network gave the highest performance while being cost-effective.Prediction results indicated that the distribution range of PCNs would shift northward with a decrease in tropical zones and an increase in northern latitudes.However,the total area of suitable regions will not change significantly,occupying 16-20%of the total land surface(18%under current conditions).This research alerts policymakers and practitioners to the risk of PCNs’incursion into new regions.Additionally,this ML process offers the capability to track changes in the distribution of various species and provides scientifically grounded evidence for formulating long-term biosecurity plans for their control.
基金supported by the National Natural Science Foundation of China(61503204)the Natural Science Foundation of Zhejiang Province(Y16F030001)the Nature Science Foundation of Ningbo City(2016A610092).
文摘Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and multi-manifold projections(MMP)models,and then combines the multiple solutions within an ensemble result through Bayesian inference.In the developed ESA model,different structure features of the given dataset are taken into account simultaneously,the suitability and reliability of the ESA-based monitoring model are then illustrated through comparison.Introduction:The requirement for ensuring safe operation and improving process efficiency has led to increased research activity in the field of process monitoring.
文摘Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.
基金supported by the National Natural Science Foundation of China (Grant No. 12234013)the Natural Science Foundation of Shandong Province (Grant No. ZR2021LLZ009)。
文摘We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory entropies are studied in two typical potentials, i.e., harmonic potential and double-well potential, and in viscous environment by interacting trajectory method. The results of the trajectory methods are in agreement well with the numerical methods(Monte Carlo simulation and difference equation). The single-trajectory entropies increasing(decreasing) could be caused by absorption(emission) heat from(to) the thermal environment. Also, some interesting trajectories, which correspond to the rare evens in the processes, are demonstrated.
基金Project supported by the National Key Research and Development Program of China (Grant No.2021YFB3900701)the Science and Technology Plan Project of the State Administration for Market Regulation of China (Grant No.2023MK178)the National Natural Science Foundation of China (Grant No.42227802)。
文摘A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.