Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model ...Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model with high dimensional frequency spectra of these signals. This paper aims to develop a selective ensemble modeling approach based on nonlinear latent frequency spectral feature extraction for accurate measurement of material to ball volume ratio. Latent features are first extracted from different vibrations and acoustic spectral segments by kernel partial least squares. Algorithms of bootstrap and least squares support vector machines are employed to produce candidate sub-models using these latent features as inputs. Ensemble sub-models are selected based on genetic algorithm optimization toolbox. Partial least squares regression is used to combine these sub-models to eliminate collinearity among their prediction outputs. Results indicate that the proposed modeling approach has better prediction performance than previous ones.展开更多
A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on...A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.展开更多
Chemical processes are complex, for which traditional neural network models usually can not lead to satisfactory accuracy. Selective neural network ensemble is an effective way to enhance the generalization accuracy o...Chemical processes are complex, for which traditional neural network models usually can not lead to satisfactory accuracy. Selective neural network ensemble is an effective way to enhance the generalization accuracy of networks, but there are some problems, e.g., lacking of unified definition of diversity among component neural networks and difficult to improve the accuracy by selecting if the diversities of available networks are small. In this study, the output errors of networks are vectorized, the diversity of networks is defined based on the error vectors, and the size of ensemble is analyzed. Then an error vectorization based selective neural network ensemble (EVSNE) is proposed, in which the error vector of each network can offset that of the other networks by training the component networks orderly. Thus the component networks have large diversity. Experiments and comparisons over standard data sets and actual chemical process data set for production of high-density polyethylene demonstrate that EVSNE performs better in generalization ability.展开更多
When it comes to smart healthcare business systems,network-based intrusion detection systems are crucial for protecting the system and its networks from malicious network assaults.To protect IoMT devices and networks ...When it comes to smart healthcare business systems,network-based intrusion detection systems are crucial for protecting the system and its networks from malicious network assaults.To protect IoMT devices and networks in healthcare and medical settings,our proposed model serves as a powerful tool for monitoring IoMT networks.This study presents a robust methodology for intrusion detection in Internet of Medical Things(IoMT)environments,integrating data augmentation,feature selection,and ensemble learning to effectively handle IoMT data complexity.Following rigorous preprocessing,including feature extraction,correlation removal,and Recursive Feature Elimi-nation(RFE),selected features are standardized and reshaped for deep learning models.Augmentation using the BAT algorithm enhances dataset variability.Three deep learning models,Transformer-based neural networks,self-attention Deep Convolutional Neural Networks(DCNNs),and Long Short-Term Memory(LSTM)networks,are trained to capture diverse data aspects.Their predictions form a meta-feature set for a subsequent meta-learner,which combines model strengths.Conventional classifiers validate meta-learner features for broad algorithm suitability.This comprehensive method demonstrates high accuracy and robustness in IoMT intrusion detection.Evaluations were conducted using two datasets:the publicly available WUSTL-EHMS-2020 dataset,which contains two distinct categories,and the CICIoMT2024 dataset,encompassing sixteen categories.Experimental results showcase the method’s exceptional performance,achieving optimal scores of 100%on the WUSTL-EHMS-2020 dataset and 99%on the CICIoMT2024.展开更多
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves elim...Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms.展开更多
This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and impro...This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and improve survival rates.We introduce a metaheuristic-driven two-stage ensemble deep learning model for efficient lung/colon cancer classification.The diagnosis of lung and colon cancers is attempted using several unique indicators by different versions of deep Convolutional Neural Networks(CNNs)in feature extraction and model constructions,and utilizing the power of various Machine Learning(ML)algorithms for final classification.Specifically,we consider different scenarios consisting of two-class colon cancer,three-class lung cancer,and fiveclass combined lung/colon cancer to conduct feature extraction using four CNNs.These extracted features are then integrated to create a comprehensive feature set.In the next step,the optimization of the feature selection is conducted using a metaheuristic algorithm based on the Electric Eel Foraging Optimization(EEFO).This optimized feature subset is subsequently employed in various ML algorithms to determine the most effective ones through a rigorous evaluation process.The top-performing algorithms are refined using the High-Performance Filter(HPF)and integrated into an ensemble learning framework employing weighted averaging.Our findings indicate that the proposed ensemble learning model significantly surpasses existing methods in classification accuracy across all datasets,achieving accuracies of 99.85%for the two-class,98.70%for the three-class,and 98.96%for the five-class datasets.展开更多
Ensemble-based analyses are useful to compare equiprobable scenarios of the reservoir models.However,they require a large suite of reservoir models to cover high uncertainty in heterogeneous and complex reservoir mode...Ensemble-based analyses are useful to compare equiprobable scenarios of the reservoir models.However,they require a large suite of reservoir models to cover high uncertainty in heterogeneous and complex reservoir models.For stable convergence in ensemble Kalman filter(EnKF),increasing ensemble size can be one of the solutions,but it causes high computational cost in large-scale reservoir systems.In this paper,we propose a preprocessing of good initial model selection to reduce the ensemble size,and then,EnKF is utilized to predict production performances stochastically.In the model selection scheme,representative models are chosen by using principal component analysis(PCA)and clustering analysis.The dimension of initial models is reduced using PCA,and the reduced models are grouped by clustering.Then,we choose and simulate representative models from the cluster groups to compare errors of production predictions with historical observation data.One representative model with the minimum error is considered as the best model,and we use the ensemble members near the best model in the cluster plane for applying EnKF.We demonstrate the proposed scheme for two 3D models that EnKF provides reliable assimilation results with much reduced computation time.展开更多
Metamaterial Antenna is a subclass of antennas that makes use of metamaterial to improve performance.Metamaterial antennas can overcome the bandwidth constraint associated with tiny antennas.Machine learning is receiv...Metamaterial Antenna is a subclass of antennas that makes use of metamaterial to improve performance.Metamaterial antennas can overcome the bandwidth constraint associated with tiny antennas.Machine learning is receiving a lot of interest in optimizing solutions in a variety of areas.Machine learning methods are already a significant component of ongoing research and are anticipated to play a critical role in today’s technology.The accuracy of the forecast is mostly determined by the model used.The purpose of this article is to provide an optimal ensemble model for predicting the bandwidth and gain of the Metamaterial Antenna.Support Vector Machines(SVM),Random Forest,K-Neighbors Regressor,and Decision Tree Regressor were utilized as the basic models.The Adaptive Dynamic Polar Rose Guided Whale Optimization method,named AD-PRS-Guided WOA,was used to pick the optimal features from the datasets.The suggested model is compared to models based on five variables and to the average ensemble model.The findings indicate that the presented model using Random Forest results in a Root Mean Squared Error(RMSE)of(0.0102)for bandwidth and RMSE of(0.0891)for gain.This is superior to other models and can accurately predict antenna bandwidth and gain.展开更多
Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic alg...Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.展开更多
To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec- tion. Firstly, the multilevel eli...To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec- tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles' performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec- tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Finally, the en- semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which will greatly improve the fea- sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.展开更多
Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,s...Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.展开更多
Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for construct...Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).展开更多
Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on...Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.展开更多
In this paper ensemble learning based feature selection and classifier ensemble model is proposed to improve classification accuracy. The hypothesis is that good feature sets contain features that are highly correlate...In this paper ensemble learning based feature selection and classifier ensemble model is proposed to improve classification accuracy. The hypothesis is that good feature sets contain features that are highly correlated with the class from ensemble feature selection to SVM ensembles which can be achieved on the performance of classification accuracy. The proposed approach consists of two phases: (i) to select feature sets that are likely to be the support vectors by applying ensemble based feature selection methods;and (ii) to construct an SVM ensemble using the selected features. The proposed approach was evaluated by experiments on Cardiotocography dataset. Four feature selection techniques were used: (i) Correlation-based, (ii) Consistency-based, (iii) ReliefF and (iv) Information Gain. Experimental results showed that using the ensemble of Information Gain feature selection and Correlation-based feature selection with SVM ensembles achieved higher classification accuracy than both single SVM classifier and ensemble feature selection with SVM classifier.展开更多
Stomatopods are better known as mantis shrimp with considerable ecological importance in wide coastal waters globally. Some stomatopod species are exploited commercially, including Oratosquilla oratoria in the Northwe...Stomatopods are better known as mantis shrimp with considerable ecological importance in wide coastal waters globally. Some stomatopod species are exploited commercially, including Oratosquilla oratoria in the Northwest Pacific. Yet, few studies have published to promote accurate habitat identification of stomatopods, obstructing scientific management and conservation of these valuable organisms. This study provides an ensemble modeling framework for habitat suitability modeling of stomatopods, utilizing the O. oratoria stock in the Bohai Sea as an example. Two modeling techniques(i.e., generalized additive model(GAM) and geographical weighted regression(GWR)) were applied to select environmental predictors(especially the selection between two types of sediment metrics) that better characterize O. oratoria distribution and build separate habitat suitability models(HSM). The performance of the individual HSMs were compared on interpolation accuracy and transferability.Then, they were integrated to check whether the ensemble model outperforms either individual model, according to fishers’ knowledge and scientific survey data. As a result, grain-size metrics of sediment outperformed sediment content metrics in modeling O. oratoria habitat, possibly because grain-size metrics not only reflect the effect of substrates on burrow development, but also link to sediment heat capacity which influences individual thermoregulation. Moreover, the GWR-based HSM outperformed the GAM-based HSM in interpolation accuracy,while the latter one displayed better transferability. On balance, the ensemble HSM appeared to improve the predictive performance overall, as it could avoid dependence on a single model type and successfully identified fisher-recognized and survey-indicated suitable habitats in either sparsely sampled or well investigated areas.展开更多
The continuously booming of information technology has shed light on developing a variety of communication networks,multimedia,social networks and Internet of Things applications.However,users inevitably suffer from t...The continuously booming of information technology has shed light on developing a variety of communication networks,multimedia,social networks and Internet of Things applications.However,users inevitably suffer from the intrusion of malicious users.Some studies focus on static characteristics of malicious users,which is easy to be bypassed by camouflaged malicious users.In this paper,we present a malicious user detection method based on ensemble feature selection and adversarial training.Firstly,the feature selection alleviates the dimension disaster problem and achieves more accurate classification performance.Secondly,we embed features into the multidimensional space and aggregate it into a feature map to encode the explicit content preference and implicit interaction preference.Thirdly,we use an effective ensemble learning which could avoid over-fitting and has good noise resistance.Finally,we propose a datadriven neural network detection model with the regularization technique adversarial training to deeply analyze the characteristics.It simplifies the parameters,obtaining more robust interaction features and pattern features.We demonstrate the effectiveness of our approach with numerical simulation results for malicious user detection,where the robustness issues are notable concerns.展开更多
Executing customer analysis in a systemic way is one of the possible solutions for each enterprise to understand the behavior of consumer patterns in an efficient and in-depth manner.Further investigation of customer p...Executing customer analysis in a systemic way is one of the possible solutions for each enterprise to understand the behavior of consumer patterns in an efficient and in-depth manner.Further investigation of customer patterns helps thefirm to develop efficient decisions and in turn,helps to optimize the enter-prise’s business and maximizes consumer satisfaction correspondingly.To con-duct an effective assessment about the customers,Naive Bayes(also called Simple Bayes),a machine learning model is utilized.However,the efficacious of the simple Bayes model is utterly relying on the consumer data used,and the existence of uncertain and redundant attributes in the consumer data enables the simple Bayes model to attain the worst prediction in consumer data because of its presumption regarding the attributes applied.However,in practice,the NB pre-mise is not true in consumer data,and the analysis of these redundant attributes enables simple Bayes model to get poor prediction results.In this work,an ensem-ble attribute selection methodology is performed to overcome the problem with consumer data and to pick a steady uncorrelated attribute set to model with the NB classifier.In ensemble variable selection,two different strategies are applied:one is based upon data perturbation(or homogeneous ensemble,same feature selector is applied to a different subsamples derived from the same learning set)and the other one is based upon function perturbation(or heterogeneous ensemble different feature selector is utilized to the same learning set).Further-more,the feature set captured from both ensemble strategies is applied to NB indi-vidually and the outcome obtained is computed.Finally,the experimental outcomes show that the proposed ensemble strategies perform efficiently in choosing a steady attribute set and increasing NB classification performance efficiently.展开更多
Autism Spectrum Disorder(ASD)is a complicated neurodevelopmen-tal disorder that is often identified in toddlers.The microarray data is used as a diagnostic tool to identify the genetics of the disorder.However,microarr...Autism Spectrum Disorder(ASD)is a complicated neurodevelopmen-tal disorder that is often identified in toddlers.The microarray data is used as a diagnostic tool to identify the genetics of the disorder.However,microarray data is large and has a high volume.Consequently,it suffers from the problem of dimensionality.In microarray data,the sample size and variance of the gene expression will lead to overfitting and misclassification.Identifying the autism gene(feature)subset from microarray data is an important and challenging research area.It has to be efficiently addressed to improve gene feature selection and classification.To overcome the challenges,a novel Intelligent Hybrid Ensem-ble Gene Selection(IHEGS)model is proposed in this paper.The proposed model integrates the intelligence of different feature selection techniques over the data partitions.In this model,the initial gene selection is carried out by data perturba-tion,and thefinal autism gene subset is obtained by functional perturbation,which reduces the problem of dimensionality in microarray data.The functional perturbation module employs three meta-heuristic swarm intelligence-based tech-niques for gene selection.The obtained gene subset is validated by the Deep Neural Network(DNN)model.The proposed model is implemented using python with six National Center for Biotechnology Information(NCBI)gene expression datasets.From the comparative study with other existing state-of-the-art systems,the proposed model provides stable results in terms of feature selection and clas-sification accuracy.展开更多
Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessa...Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessary for the estimation of yields in these reactions.This study explores ten metaheuristic algorithms for descriptor selection and model a voting ensemble for evaluation.The algorithms were evaluated based on computational time and the number of selected descriptors.Analyses show that robust performance is obtained with more descriptors,compared to cases where fewer descriptors are selected.The essential descriptor was deduced based on the frequency of occurrence within the 50 extracted data subsets,and better performance was achieved with the voting ensemble than other algorithms with RMSE of 6.4270 and R^(2) of 0.9423.The results and deductions from this study can be readily applied in the decision-making process of chemical synthesis by saving the computational cost associated with initial descriptor selection for yield estimation.The ensemble model has also shown robust performance in its yield estimation ability and efficiency.展开更多
基金Supported partially by the Post Doctoral Natural Science Foundation of China(2013M532118,2015T81082)the National Natural Science Foundation of China(61573364,61273177,61503066)+2 种基金the State Key Laboratory of Synthetical Automation for Process Industriesthe National High Technology Research and Development Program of China(2015AA043802)the Scientific Research Fund of Liaoning Provincial Education Department(L2013272)
文摘Strong mechanical vibration and acoustical signals of grinding process contain useful information related to load parameters in ball mills. It is a challenge to extract latent features and construct soft sensor model with high dimensional frequency spectra of these signals. This paper aims to develop a selective ensemble modeling approach based on nonlinear latent frequency spectral feature extraction for accurate measurement of material to ball volume ratio. Latent features are first extracted from different vibrations and acoustic spectral segments by kernel partial least squares. Algorithms of bootstrap and least squares support vector machines are employed to produce candidate sub-models using these latent features as inputs. Ensemble sub-models are selected based on genetic algorithm optimization toolbox. Partial least squares regression is used to combine these sub-models to eliminate collinearity among their prediction outputs. Results indicate that the proposed modeling approach has better prediction performance than previous ones.
基金Supported by the Scientific Research Foundation of Liaoning Provincial Department of Education (No.LJKZ0139)。
文摘A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.
基金Supported by the National Natural Science Foundation of China (61074153, 61104131)the Fundamental Research Fundsfor Central Universities of China (ZY1111, JD1104)
文摘Chemical processes are complex, for which traditional neural network models usually can not lead to satisfactory accuracy. Selective neural network ensemble is an effective way to enhance the generalization accuracy of networks, but there are some problems, e.g., lacking of unified definition of diversity among component neural networks and difficult to improve the accuracy by selecting if the diversities of available networks are small. In this study, the output errors of networks are vectorized, the diversity of networks is defined based on the error vectors, and the size of ensemble is analyzed. Then an error vectorization based selective neural network ensemble (EVSNE) is proposed, in which the error vector of each network can offset that of the other networks by training the component networks orderly. Thus the component networks have large diversity. Experiments and comparisons over standard data sets and actual chemical process data set for production of high-density polyethylene demonstrate that EVSNE performs better in generalization ability.
基金supported by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.DGSSR-2023-02-02116.
文摘When it comes to smart healthcare business systems,network-based intrusion detection systems are crucial for protecting the system and its networks from malicious network assaults.To protect IoMT devices and networks in healthcare and medical settings,our proposed model serves as a powerful tool for monitoring IoMT networks.This study presents a robust methodology for intrusion detection in Internet of Medical Things(IoMT)environments,integrating data augmentation,feature selection,and ensemble learning to effectively handle IoMT data complexity.Following rigorous preprocessing,including feature extraction,correlation removal,and Recursive Feature Elimi-nation(RFE),selected features are standardized and reshaped for deep learning models.Augmentation using the BAT algorithm enhances dataset variability.Three deep learning models,Transformer-based neural networks,self-attention Deep Convolutional Neural Networks(DCNNs),and Long Short-Term Memory(LSTM)networks,are trained to capture diverse data aspects.Their predictions form a meta-feature set for a subsequent meta-learner,which combines model strengths.Conventional classifiers validate meta-learner features for broad algorithm suitability.This comprehensive method demonstrates high accuracy and robustness in IoMT intrusion detection.Evaluations were conducted using two datasets:the publicly available WUSTL-EHMS-2020 dataset,which contains two distinct categories,and the CICIoMT2024 dataset,encompassing sixteen categories.Experimental results showcase the method’s exceptional performance,achieving optimal scores of 100%on the WUSTL-EHMS-2020 dataset and 99%on the CICIoMT2024.
基金supported by Universiti Sains Malaysia(USM)and School of Computer Sciences,USM。
文摘Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms.
文摘This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and improve survival rates.We introduce a metaheuristic-driven two-stage ensemble deep learning model for efficient lung/colon cancer classification.The diagnosis of lung and colon cancers is attempted using several unique indicators by different versions of deep Convolutional Neural Networks(CNNs)in feature extraction and model constructions,and utilizing the power of various Machine Learning(ML)algorithms for final classification.Specifically,we consider different scenarios consisting of two-class colon cancer,three-class lung cancer,and fiveclass combined lung/colon cancer to conduct feature extraction using four CNNs.These extracted features are then integrated to create a comprehensive feature set.In the next step,the optimization of the feature selection is conducted using a metaheuristic algorithm based on the Electric Eel Foraging Optimization(EEFO).This optimized feature subset is subsequently employed in various ML algorithms to determine the most effective ones through a rigorous evaluation process.The top-performing algorithms are refined using the High-Performance Filter(HPF)and integrated into an ensemble learning framework employing weighted averaging.Our findings indicate that the proposed ensemble learning model significantly surpasses existing methods in classification accuracy across all datasets,achieving accuracies of 99.85%for the two-class,98.70%for the three-class,and 98.96%for the five-class datasets.
基金supported by The Ministry of Trade,Industry,and Energy(20172510102090,20142520100440,20162010201980)Global PhD Fellowship Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2015H1A2A1030756)supported by the National Research Foundation of Korea(NRF)Grant(No.2018R1C1B5045260).
文摘Ensemble-based analyses are useful to compare equiprobable scenarios of the reservoir models.However,they require a large suite of reservoir models to cover high uncertainty in heterogeneous and complex reservoir models.For stable convergence in ensemble Kalman filter(EnKF),increasing ensemble size can be one of the solutions,but it causes high computational cost in large-scale reservoir systems.In this paper,we propose a preprocessing of good initial model selection to reduce the ensemble size,and then,EnKF is utilized to predict production performances stochastically.In the model selection scheme,representative models are chosen by using principal component analysis(PCA)and clustering analysis.The dimension of initial models is reduced using PCA,and the reduced models are grouped by clustering.Then,we choose and simulate representative models from the cluster groups to compare errors of production predictions with historical observation data.One representative model with the minimum error is considered as the best model,and we use the ensemble members near the best model in the cluster plane for applying EnKF.We demonstrate the proposed scheme for two 3D models that EnKF provides reliable assimilation results with much reduced computation time.
文摘Metamaterial Antenna is a subclass of antennas that makes use of metamaterial to improve performance.Metamaterial antennas can overcome the bandwidth constraint associated with tiny antennas.Machine learning is receiving a lot of interest in optimizing solutions in a variety of areas.Machine learning methods are already a significant component of ongoing research and are anticipated to play a critical role in today’s technology.The accuracy of the forecast is mostly determined by the model used.The purpose of this article is to provide an optimal ensemble model for predicting the bandwidth and gain of the Metamaterial Antenna.Support Vector Machines(SVM),Random Forest,K-Neighbors Regressor,and Decision Tree Regressor were utilized as the basic models.The Adaptive Dynamic Polar Rose Guided Whale Optimization method,named AD-PRS-Guided WOA,was used to pick the optimal features from the datasets.The suggested model is compared to models based on five variables and to the average ensemble model.The findings indicate that the presented model using Random Forest results in a Root Mean Squared Error(RMSE)of(0.0102)for bandwidth and RMSE of(0.0891)for gain.This is superior to other models and can accurately predict antenna bandwidth and gain.
基金supported by the National High-Tech Research and Development Plan of China (No.2007AA04Z224)the National Natural Science Foundation of China (No.60775047, 60835004)
文摘Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.
基金supported by the National Natural Science Foundation of China(6113900261171132+4 种基金61300167)the Natural Science Foundation of Jiangsu Education Department(12KJB520013)the Open Project Program of Jiangsu Provincial Key Laboratory of Computer Information Processing Technologythe Qing Lan Project of Jiangsu Provincethe Starting Foundation for Doctoral Scientific Research,Nantong University(14B20)
文摘To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec- tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles' performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec- tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Finally, the en- semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which will greatly improve the fea- sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.
文摘Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.
文摘Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).
基金supported by the National Natural Science Foundation of China under Grant No. 30525030, 60701015, and 60736029.
文摘Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.
文摘In this paper ensemble learning based feature selection and classifier ensemble model is proposed to improve classification accuracy. The hypothesis is that good feature sets contain features that are highly correlated with the class from ensemble feature selection to SVM ensembles which can be achieved on the performance of classification accuracy. The proposed approach consists of two phases: (i) to select feature sets that are likely to be the support vectors by applying ensemble based feature selection methods;and (ii) to construct an SVM ensemble using the selected features. The proposed approach was evaluated by experiments on Cardiotocography dataset. Four feature selection techniques were used: (i) Correlation-based, (ii) Consistency-based, (iii) ReliefF and (iv) Information Gain. Experimental results showed that using the ensemble of Information Gain feature selection and Correlation-based feature selection with SVM ensembles achieved higher classification accuracy than both single SVM classifier and ensemble feature selection with SVM classifier.
基金The National Natural Science Foundation of China under contract No.31902375the David and Lucile Packard Foundation+1 种基金the Innovation Team of Fishery Resources and Ecology in the Yellow Sea and Bohai Sea under contract No.2020TD01the Special Funds for Taishan Scholars Project of Shandong Province。
文摘Stomatopods are better known as mantis shrimp with considerable ecological importance in wide coastal waters globally. Some stomatopod species are exploited commercially, including Oratosquilla oratoria in the Northwest Pacific. Yet, few studies have published to promote accurate habitat identification of stomatopods, obstructing scientific management and conservation of these valuable organisms. This study provides an ensemble modeling framework for habitat suitability modeling of stomatopods, utilizing the O. oratoria stock in the Bohai Sea as an example. Two modeling techniques(i.e., generalized additive model(GAM) and geographical weighted regression(GWR)) were applied to select environmental predictors(especially the selection between two types of sediment metrics) that better characterize O. oratoria distribution and build separate habitat suitability models(HSM). The performance of the individual HSMs were compared on interpolation accuracy and transferability.Then, they were integrated to check whether the ensemble model outperforms either individual model, according to fishers’ knowledge and scientific survey data. As a result, grain-size metrics of sediment outperformed sediment content metrics in modeling O. oratoria habitat, possibly because grain-size metrics not only reflect the effect of substrates on burrow development, but also link to sediment heat capacity which influences individual thermoregulation. Moreover, the GWR-based HSM outperformed the GAM-based HSM in interpolation accuracy,while the latter one displayed better transferability. On balance, the ensemble HSM appeared to improve the predictive performance overall, as it could avoid dependence on a single model type and successfully identified fisher-recognized and survey-indicated suitable habitats in either sparsely sampled or well investigated areas.
基金supported in part by projects of National Natural Science Foundation of China under Grant 61772406 and Grant 61941105supported in part by projects of the Fundamental Research Funds for the Central Universitiesthe Innovation Fund of Xidian University under Grant 500120109215456.
文摘The continuously booming of information technology has shed light on developing a variety of communication networks,multimedia,social networks and Internet of Things applications.However,users inevitably suffer from the intrusion of malicious users.Some studies focus on static characteristics of malicious users,which is easy to be bypassed by camouflaged malicious users.In this paper,we present a malicious user detection method based on ensemble feature selection and adversarial training.Firstly,the feature selection alleviates the dimension disaster problem and achieves more accurate classification performance.Secondly,we embed features into the multidimensional space and aggregate it into a feature map to encode the explicit content preference and implicit interaction preference.Thirdly,we use an effective ensemble learning which could avoid over-fitting and has good noise resistance.Finally,we propose a datadriven neural network detection model with the regularization technique adversarial training to deeply analyze the characteristics.It simplifies the parameters,obtaining more robust interaction features and pattern features.We demonstrate the effectiveness of our approach with numerical simulation results for malicious user detection,where the robustness issues are notable concerns.
文摘Executing customer analysis in a systemic way is one of the possible solutions for each enterprise to understand the behavior of consumer patterns in an efficient and in-depth manner.Further investigation of customer patterns helps thefirm to develop efficient decisions and in turn,helps to optimize the enter-prise’s business and maximizes consumer satisfaction correspondingly.To con-duct an effective assessment about the customers,Naive Bayes(also called Simple Bayes),a machine learning model is utilized.However,the efficacious of the simple Bayes model is utterly relying on the consumer data used,and the existence of uncertain and redundant attributes in the consumer data enables the simple Bayes model to attain the worst prediction in consumer data because of its presumption regarding the attributes applied.However,in practice,the NB pre-mise is not true in consumer data,and the analysis of these redundant attributes enables simple Bayes model to get poor prediction results.In this work,an ensem-ble attribute selection methodology is performed to overcome the problem with consumer data and to pick a steady uncorrelated attribute set to model with the NB classifier.In ensemble variable selection,two different strategies are applied:one is based upon data perturbation(or homogeneous ensemble,same feature selector is applied to a different subsamples derived from the same learning set)and the other one is based upon function perturbation(or heterogeneous ensemble different feature selector is utilized to the same learning set).Further-more,the feature set captured from both ensemble strategies is applied to NB indi-vidually and the outcome obtained is computed.Finally,the experimental outcomes show that the proposed ensemble strategies perform efficiently in choosing a steady attribute set and increasing NB classification performance efficiently.
文摘Autism Spectrum Disorder(ASD)is a complicated neurodevelopmen-tal disorder that is often identified in toddlers.The microarray data is used as a diagnostic tool to identify the genetics of the disorder.However,microarray data is large and has a high volume.Consequently,it suffers from the problem of dimensionality.In microarray data,the sample size and variance of the gene expression will lead to overfitting and misclassification.Identifying the autism gene(feature)subset from microarray data is an important and challenging research area.It has to be efficiently addressed to improve gene feature selection and classification.To overcome the challenges,a novel Intelligent Hybrid Ensem-ble Gene Selection(IHEGS)model is proposed in this paper.The proposed model integrates the intelligence of different feature selection techniques over the data partitions.In this model,the initial gene selection is carried out by data perturba-tion,and thefinal autism gene subset is obtained by functional perturbation,which reduces the problem of dimensionality in microarray data.The functional perturbation module employs three meta-heuristic swarm intelligence-based tech-niques for gene selection.The obtained gene subset is validated by the Deep Neural Network(DNN)model.The proposed model is implemented using python with six National Center for Biotechnology Information(NCBI)gene expression datasets.From the comparative study with other existing state-of-the-art systems,the proposed model provides stable results in terms of feature selection and clas-sification accuracy.
基金The work described in this paper was substantially supported by the grant from the Research Grants Council of the Hong Kong Special Administrative Region[CityU 11200218]one grant from the Health and Medical Research Fund,the Food and Health Bureau,The Government of the Hong Kong Special Administrative Region[07181426]+1 种基金and the funding from Hong Kong Institute for Data Science(HKIDS)at City University of Hong Kong.The work described in this paper was partially supported by two grants from City University of Hong Kong(CityU 11202219,CityU 11203520)This research was substantially sponsored by the research project(Grant No.32000464)supported by the National Natural Science Foundation of China and was substantially supported by the Shenzhen Research Institute,City University of Hong Kong.The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research with the project number(442/77).
文摘Bioactive compounds in plants,which can be synthesized using N-arylationmethods such as the Buchwald-Hartwig reaction,are essential in drug discovery for their pharmacological effects.Important descriptors are necessary for the estimation of yields in these reactions.This study explores ten metaheuristic algorithms for descriptor selection and model a voting ensemble for evaluation.The algorithms were evaluated based on computational time and the number of selected descriptors.Analyses show that robust performance is obtained with more descriptors,compared to cases where fewer descriptors are selected.The essential descriptor was deduced based on the frequency of occurrence within the 50 extracted data subsets,and better performance was achieved with the voting ensemble than other algorithms with RMSE of 6.4270 and R^(2) of 0.9423.The results and deductions from this study can be readily applied in the decision-making process of chemical synthesis by saving the computational cost associated with initial descriptor selection for yield estimation.The ensemble model has also shown robust performance in its yield estimation ability and efficiency.