In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature sel...In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.展开更多
Radiomics is a non-invasive method for extracting quantitative and higher-dimensional features from medical images for diagnosis.It has received great attention due to its huge application prospects in recent years.We...Radiomics is a non-invasive method for extracting quantitative and higher-dimensional features from medical images for diagnosis.It has received great attention due to its huge application prospects in recent years.We can know that the number of features selected by the existing radiomics feature selectionmethods is basically about ten.In this paper,a heuristic feature selection method based on frequency iteration and multiple supervised training mode is proposed.Based on the combination between features,it decomposes all features layer by layer to select the optimal features for each layer,then fuses the optimal features to form a local optimal group layer by layer and iterates to the global optimal combination finally.Compared with the currentmethod with the best prediction performance in the three data sets,thismethod proposed in this paper can reduce the number of features fromabout ten to about three without losing classification accuracy and even significantly improving classification accuracy.The proposed method has better interpretability and generalization ability,which gives it great potential in the feature selection of radiomics.展开更多
Feature Selection(FS)is a key pre-processing step in pattern recognition and data mining tasks,which can effectively avoid the impact of irrelevant and redundant features on the performance of classification models.In...Feature Selection(FS)is a key pre-processing step in pattern recognition and data mining tasks,which can effectively avoid the impact of irrelevant and redundant features on the performance of classification models.In recent years,meta-heuristic algorithms have been widely used in FS problems,so a Hybrid Binary Chaotic Salp Swarm Dung Beetle Optimization(HBCSSDBO)algorithm is proposed in this paper to improve the effect of FS.In this hybrid algorithm,the original continuous optimization algorithm is converted into binary form by the S-type transfer function and applied to the FS problem.By combining the K nearest neighbor(KNN)classifier,the comparative experiments for FS are carried out between the proposed method and four advanced meta-heuristic algorithms on 16 UCI(University of California,Irvine)datasets.Seven evaluation metrics such as average adaptation,average prediction accuracy,and average running time are chosen to judge and compare the algorithms.The selected dataset is also discussed by categorizing it into three dimensions:high,medium,and low dimensions.Experimental results show that the HBCSSDBO feature selection method has the ability to obtain a good subset of features while maintaining high classification accuracy,shows better optimization performance.In addition,the results of statistical tests confirm the significant validity of the method.展开更多
Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of decepti...Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of deception detection.In this paper,we investigate video-based deception detection considering both apparent visual features such as eye gaze,head pose and facial action unit(AU),and non-contact heart rate detected by remote photoplethysmography(rPPG)technique.Multiple wrapper-based feature selection methods combined with the K-nearest neighbor(KNN)and support vector machine(SVM)classifiers are employed to screen the most effective features for deception detection.We evaluate the performance of the proposed method on both a self-collected physiological-assisted visual deception detection(PV3D)dataset and a public bag-oflies(BOL)dataset.Experimental results demonstrate that the SVM classifier with symbiotic organisms search(SOS)feature selection yields the best overall performance,with an area under the curve(AUC)of 83.27%and accuracy(ACC)of 83.33%for PV3D,and an AUC of 71.18%and ACC of 70.33%for BOL.This demonstrates the stability and effectiveness of the proposed method in video-based deception detection tasks.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod ...This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.展开更多
With the rise of remote work and the digital industry,advanced cyberattacks have become more diverse and complex in terms of attack types and characteristics,rendering them difficult to detect with conventional intrus...With the rise of remote work and the digital industry,advanced cyberattacks have become more diverse and complex in terms of attack types and characteristics,rendering them difficult to detect with conventional intrusion detection methods.Signature-based intrusion detection methods can be used to detect attacks;however,they cannot detect new malware.Endpoint detection and response(EDR)tools are attracting attention as a means of detecting attacks on endpoints in real-time to overcome the limitations of signature-based intrusion detection techniques.However,EDR tools are restricted by the continuous generation of unnecessary logs,resulting in poor detection performance and memory efficiency.Machine learning-based intrusion detection techniques for responding to advanced cyberattacks are memory intensive,using numerous features;they lack optimal feature selection for each attack type.To overcome these limitations,this study proposes a memory-efficient intrusion detection approach incorporating multi-binary classifiers using optimal feature selection.The proposed model detects multiple types of malicious attacks using parallel binary classifiers with optimal features for each attack type.The experimental results showed a 2.95%accuracy improvement and an 88.05%memory reduction using only six features compared to a model with 18 features.Furthermore,compared to a conventional multi-classification model with simple feature selection based on permutation importance,the accuracy improved by 11.67%and the memory usage decreased by 44.87%.The proposed scheme demonstrates that effective intrusion detection is achievable with minimal features,making it suitable for memory-limited mobile and Internet of Things devices.展开更多
With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signatu...With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.展开更多
The diversity of data sources resulted in seeking effective manipulation and dissemination.The challenge that arises from the increasing dimensionality has a negative effect on the computation performance,efficiency,a...The diversity of data sources resulted in seeking effective manipulation and dissemination.The challenge that arises from the increasing dimensionality has a negative effect on the computation performance,efficiency,and stability of computing.One of the most successful optimization algorithms is Particle Swarm Optimization(PSO)which has proved its effectiveness in exploring the highest influencing features in the search space based on its fast convergence and the ability to utilize a small set of parameters in the search task.This research proposes an effective enhancement of PSO that tackles the challenge of randomness search which directly enhances PSO performance.On the other hand,this research proposes a generic intelligent framework for early prediction of orders delay and eliminate orders backlogs which could be considered as an efficient potential solution for raising the supply chain performance.The proposed adapted algorithm has been applied to a supply chain dataset which minimized the features set from twenty-one features to ten significant features.To confirm the proposed algorithm results,the updated data has been examined by eight of the well-known classification algorithms which reached a minimum accuracy percentage equal to 94.3%for random forest and a maximum of 99.0 for Naïve Bayes.Moreover,the proposed algorithm adaptation has been compared with other proposed adaptations of PSO from the literature over different datasets.The proposed PSO adaptation reached a higher accuracy compared with the literature ranging from 97.8 to 99.36 which also proved the advancement of the current research.展开更多
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves elim...Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms.展开更多
This research focuses on improving the Harris’Hawks Optimization algorithm(HHO)by tackling several of its shortcomings,including insufficient population diversity,an imbalance in exploration vs.exploitation,and a lac...This research focuses on improving the Harris’Hawks Optimization algorithm(HHO)by tackling several of its shortcomings,including insufficient population diversity,an imbalance in exploration vs.exploitation,and a lack of thorough exploitation depth.To tackle these shortcomings,it proposes enhancements from three distinct perspectives:an initialization technique for populations grounded in opposition-based learning,a strategy for updating escape energy factors to improve the equilibrium between exploitation and exploration,and a comprehensive exploitation approach that utilizes variable neighborhood search along with mutation operators.The effectiveness of the Improved Harris Hawks Optimization algorithm(IHHO)is assessed by comparing it to five leading algorithms across 23 benchmark test functions.Experimental findings indicate that the IHHO surpasses several contemporary algorithms its problem-solving capabilities.Additionally,this paper introduces a feature selection method leveraging the IHHO algorithm(IHHO-FS)to address challenges such as low efficiency in feature selection and high computational costs(time to find the optimal feature combination and model response time)associated with high-dimensional datasets.Comparative analyses between IHHO-FS and six other advanced feature selection methods are conducted across eight datasets.The results demonstrate that IHHO-FS significantly reduces the computational costs associated with classification models by lowering data dimensionality,while also enhancing the efficiency of feature selection.Furthermore,IHHO-FS shows strong competitiveness relative to numerous algorithms.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we...The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.展开更多
More businesses are deploying powerful Intrusion Detection Systems(IDS)to secure their data and physical assets.Improved cyber-attack detection and prevention in these systems requires machine learning(ML)approaches.T...More businesses are deploying powerful Intrusion Detection Systems(IDS)to secure their data and physical assets.Improved cyber-attack detection and prevention in these systems requires machine learning(ML)approaches.This paper examines a cyber-attack prediction system combining feature selection(FS)and ML.Our technique’s foundation was based on Correlation Analysis(CA),Mutual Information(MI),and recursive feature reduction with cross-validation.To optimize the IDS performance,the security features must be carefully selected from multiple-dimensional datasets,and our hybrid FS technique must be extended to validate our methodology using the improved UNSW-NB 15 and TON_IoT datasets.Our technique identified 22 key characteristics in UNSW-NB-15 and 8 in TON_IoT.We evaluated prediction using seven ML methods:Decision Tree(DT),Random Forest(RF),Logistic Regression(LR),Naive Bayes(NB),K-Nearest Neighbors(KNN),Support Vector Machines(SVM),and Multilayer Perceptron(MLP)classifiers.The DT,RF,NB,and MLP classifiers helped our model surpass the competition on both datasets.Therefore,the investigational outcomes of our hybrid model may help IDSs defend business assets from various cyberattack vectors.展开更多
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi...The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.展开更多
Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of mu...Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of multimodal data to find potential health risks early and help individuals in a personalized way.Existing methods,while useful,have limitations in predictive accuracy,delay,personalization,and user interpretability,requiring a more comprehensive and efficient approach to harness modern medical IoT devices.MAIPFE is a multimodal approach integrating pre-emptive analysis,personalized feature selection,and explainable AI for real-time health monitoring and disease detection.By using AI for early disease detection,personalized health recommendations,and transparency,healthcare will be transformed.The Multimodal Approach Integrating Pre-emptive Analysis,Personalized Feature Selection,and Explainable AI(MAIPFE)framework,which combines Firefly Optimizer,Recurrent Neural Network(RNN),Fuzzy C Means(FCM),and Explainable AI,improves disease detection precision over existing methods.Comprehensive metrics show the model’s superiority in real-time health analysis.The proposed framework outperformed existing models by 8.3%in disease detection classification precision,8.5%in accuracy,5.5%in recall,2.9%in specificity,4.5%in AUC(Area Under the Curve),and 4.9%in delay reduction.Disease prediction precision increased by 4.5%,accuracy by 3.9%,recall by 2.5%,specificity by 3.5%,AUC by 1.9%,and delay levels decreased by 9.4%.MAIPFE can revolutionize healthcare with preemptive analysis,personalized health insights,and actionable recommendations.The research shows that this innovative approach improves patient outcomes and healthcare efficiency in the real world.展开更多
A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all...A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.展开更多
High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classifi...High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classification per-formance.However,identifying the optimal features within high-dimensional datasets remains a computationally demanding task,necessitating the use of efficient algorithms.This paper introduces the Arithmetic Optimization Algorithm(AOA),a novel approach for finding the optimal feature subset.AOA is specifically modified to address feature selection problems based on a transfer function.Additionally,two enhancements are incorporated into the AOA algorithm to overcome limitations such as limited precision,slow convergence,and susceptibility to local optima.The first enhancement proposes a new method for selecting solutions to be improved during the search process.This method effectively improves the original algorithm’s accuracy and convergence speed.The second enhancement introduces a local search with neighborhood strategies(AOA_NBH)during the AOA exploitation phase.AOA_NBH explores the vast search space,aiding the algorithm in escaping local optima.Our results demonstrate that incorporating neighborhood methods enhances the output and achieves significant improvement over state-of-the-art methods.展开更多
In agriculture sector, machine learning has been widely used by researchers for crop yield prediction. However, it is quite difficult to identify the most critical features from a dataset. Feature selection techniques...In agriculture sector, machine learning has been widely used by researchers for crop yield prediction. However, it is quite difficult to identify the most critical features from a dataset. Feature selection techniques allow us to remove the extraneous and noisy features from the original feature set. The feature selection techniques help the model to focus only on the important features of the data, thus reducing execution time and improving efficiency of the model. The aim of this study is to determine relevant subset features for achieving high predictive performance by using different feature selection techniques like Filter methods, Wrapper methods and embedded methods. In this work, different feature selection techniques like Rank-based feature selection technique, weighted feature selection technique and Hybrid Feature Selection Technique have been applied to the agricultural data. The optimal feature set returned by different feature selection techniques is used for yield prediction using Linear regression, Random Forest, and Decision Tree Regressor. The accuracy of prediction obtained using the above three methods has been analyzed by using different evaluation parameters. This study helps in increasing predictive accuracy with the minimum number of features.展开更多
In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literat...In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.展开更多
基金supported in part by the Natural Science Youth Foundation of Hebei Province under Grant F2019403207in part by the PhD Research Startup Foundation of Hebei GEO University under Grant BQ2019055+3 种基金in part by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant KLIGIP-2021A06in part by the Fundamental Research Funds for the Universities in Hebei Province under Grant QN202220in part by the Science and Technology Research Project for Universities of Hebei under Grant ZD2020344in part by the Guangxi Natural Science Fund General Project under Grant 2021GXNSFAA075029.
文摘In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.
基金Major Project for New Generation of AI Grant No.2018AAA0100400)the Scientific Research Fund of Hunan Provincial Education Department,China(Grant Nos.21A0350,21C0439,22A0408,22A0414,2022JJ30231,22B0559)the National Natural Science Foundation of Hunan Province,China(Grant No.2022JJ50051).
文摘Radiomics is a non-invasive method for extracting quantitative and higher-dimensional features from medical images for diagnosis.It has received great attention due to its huge application prospects in recent years.We can know that the number of features selected by the existing radiomics feature selectionmethods is basically about ten.In this paper,a heuristic feature selection method based on frequency iteration and multiple supervised training mode is proposed.Based on the combination between features,it decomposes all features layer by layer to select the optimal features for each layer,then fuses the optimal features to form a local optimal group layer by layer and iterates to the global optimal combination finally.Compared with the currentmethod with the best prediction performance in the three data sets,thismethod proposed in this paper can reduce the number of features fromabout ten to about three without losing classification accuracy and even significantly improving classification accuracy.The proposed method has better interpretability and generalization ability,which gives it great potential in the feature selection of radiomics.
基金This research was funded by the Short-Term Electrical Load Forecasting Based on Feature Selection and optimized LSTM with DBO which is the Fundamental Scientific Research Project of Liaoning Provincial Department of Education(JYTMS20230189)the Application of Hybrid Grey Wolf Algorithm in Job Shop Scheduling Problem of the Research Support Plan for Introducing High-Level Talents to Shenyang Ligong University(No.1010147001131).
文摘Feature Selection(FS)is a key pre-processing step in pattern recognition and data mining tasks,which can effectively avoid the impact of irrelevant and redundant features on the performance of classification models.In recent years,meta-heuristic algorithms have been widely used in FS problems,so a Hybrid Binary Chaotic Salp Swarm Dung Beetle Optimization(HBCSSDBO)algorithm is proposed in this paper to improve the effect of FS.In this hybrid algorithm,the original continuous optimization algorithm is converted into binary form by the S-type transfer function and applied to the FS problem.By combining the K nearest neighbor(KNN)classifier,the comparative experiments for FS are carried out between the proposed method and four advanced meta-heuristic algorithms on 16 UCI(University of California,Irvine)datasets.Seven evaluation metrics such as average adaptation,average prediction accuracy,and average running time are chosen to judge and compare the algorithms.The selected dataset is also discussed by categorizing it into three dimensions:high,medium,and low dimensions.Experimental results show that the HBCSSDBO feature selection method has the ability to obtain a good subset of features while maintaining high classification accuracy,shows better optimization performance.In addition,the results of statistical tests confirm the significant validity of the method.
基金National Natural Science Foundation of China(No.62271186)Anhui Key Project of Research and Development Plan(No.202104d07020005)。
文摘Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of deception detection.In this paper,we investigate video-based deception detection considering both apparent visual features such as eye gaze,head pose and facial action unit(AU),and non-contact heart rate detected by remote photoplethysmography(rPPG)technique.Multiple wrapper-based feature selection methods combined with the K-nearest neighbor(KNN)and support vector machine(SVM)classifiers are employed to screen the most effective features for deception detection.We evaluate the performance of the proposed method on both a self-collected physiological-assisted visual deception detection(PV3D)dataset and a public bag-oflies(BOL)dataset.Experimental results demonstrate that the SVM classifier with symbiotic organisms search(SOS)feature selection yields the best overall performance,with an area under the curve(AUC)of 83.27%and accuracy(ACC)of 83.33%for PV3D,and an AUC of 71.18%and ACC of 70.33%for BOL.This demonstrates the stability and effectiveness of the proposed method in video-based deception detection tasks.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
基金supported by the National Natural Science Foundation of China(Grant Nos.12272257,12102292,12032006)the special fund for Science and Technology Innovation Teams of Shanxi Province(Nos.202204051002006).
文摘This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.
基金supported by MOTIE under Training Industrial Security Specialist for High-Tech Industry(RS-2024-00415520)supervised by the Korea Institute for Advancement of Technology(KIAT),and by MSIT under the ICT Challenge and Advanced Network of HRD(ICAN)Program(No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning&Evaluation(IITP)。
文摘With the rise of remote work and the digital industry,advanced cyberattacks have become more diverse and complex in terms of attack types and characteristics,rendering them difficult to detect with conventional intrusion detection methods.Signature-based intrusion detection methods can be used to detect attacks;however,they cannot detect new malware.Endpoint detection and response(EDR)tools are attracting attention as a means of detecting attacks on endpoints in real-time to overcome the limitations of signature-based intrusion detection techniques.However,EDR tools are restricted by the continuous generation of unnecessary logs,resulting in poor detection performance and memory efficiency.Machine learning-based intrusion detection techniques for responding to advanced cyberattacks are memory intensive,using numerous features;they lack optimal feature selection for each attack type.To overcome these limitations,this study proposes a memory-efficient intrusion detection approach incorporating multi-binary classifiers using optimal feature selection.The proposed model detects multiple types of malicious attacks using parallel binary classifiers with optimal features for each attack type.The experimental results showed a 2.95%accuracy improvement and an 88.05%memory reduction using only six features compared to a model with 18 features.Furthermore,compared to a conventional multi-classification model with simple feature selection based on permutation importance,the accuracy improved by 11.67%and the memory usage decreased by 44.87%.The proposed scheme demonstrates that effective intrusion detection is achievable with minimal features,making it suitable for memory-limited mobile and Internet of Things devices.
基金supported by the Korea Institute for Advancement of Technology(KIAT)Grant funded by the Korean Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialists)MSIT under the ICAN(ICT Challenge and Advanced Network of HRD)Program(No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning and Evaluation(IITP).
文摘With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.
基金funded by the University of Jeddah,Jeddah,Saudi Arabia,under Grant No.(UJ-23-DR-26)。
文摘The diversity of data sources resulted in seeking effective manipulation and dissemination.The challenge that arises from the increasing dimensionality has a negative effect on the computation performance,efficiency,and stability of computing.One of the most successful optimization algorithms is Particle Swarm Optimization(PSO)which has proved its effectiveness in exploring the highest influencing features in the search space based on its fast convergence and the ability to utilize a small set of parameters in the search task.This research proposes an effective enhancement of PSO that tackles the challenge of randomness search which directly enhances PSO performance.On the other hand,this research proposes a generic intelligent framework for early prediction of orders delay and eliminate orders backlogs which could be considered as an efficient potential solution for raising the supply chain performance.The proposed adapted algorithm has been applied to a supply chain dataset which minimized the features set from twenty-one features to ten significant features.To confirm the proposed algorithm results,the updated data has been examined by eight of the well-known classification algorithms which reached a minimum accuracy percentage equal to 94.3%for random forest and a maximum of 99.0 for Naïve Bayes.Moreover,the proposed algorithm adaptation has been compared with other proposed adaptations of PSO from the literature over different datasets.The proposed PSO adaptation reached a higher accuracy compared with the literature ranging from 97.8 to 99.36 which also proved the advancement of the current research.
基金supported by Universiti Sains Malaysia(USM)and School of Computer Sciences,USM。
文摘Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms.
基金supported by the National Natural Science Foundation of China(grant number 62073330)constituted a segment of a project associated with the School of Computer Science and Information Engineering at Harbin Normal University。
文摘This research focuses on improving the Harris’Hawks Optimization algorithm(HHO)by tackling several of its shortcomings,including insufficient population diversity,an imbalance in exploration vs.exploitation,and a lack of thorough exploitation depth.To tackle these shortcomings,it proposes enhancements from three distinct perspectives:an initialization technique for populations grounded in opposition-based learning,a strategy for updating escape energy factors to improve the equilibrium between exploitation and exploration,and a comprehensive exploitation approach that utilizes variable neighborhood search along with mutation operators.The effectiveness of the Improved Harris Hawks Optimization algorithm(IHHO)is assessed by comparing it to five leading algorithms across 23 benchmark test functions.Experimental findings indicate that the IHHO surpasses several contemporary algorithms its problem-solving capabilities.Additionally,this paper introduces a feature selection method leveraging the IHHO algorithm(IHHO-FS)to address challenges such as low efficiency in feature selection and high computational costs(time to find the optimal feature combination and model response time)associated with high-dimensional datasets.Comparative analyses between IHHO-FS and six other advanced feature selection methods are conducted across eight datasets.The results demonstrate that IHHO-FS significantly reduces the computational costs associated with classification models by lowering data dimensionality,while also enhancing the efficiency of feature selection.Furthermore,IHHO-FS shows strong competitiveness relative to numerous algorithms.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant no.2019QZKK0904)Natural Science Foundation of Hebei Province(Grant no.D2022403032)S&T Program of Hebei(Grant no.E2021403001).
文摘The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.
文摘More businesses are deploying powerful Intrusion Detection Systems(IDS)to secure their data and physical assets.Improved cyber-attack detection and prevention in these systems requires machine learning(ML)approaches.This paper examines a cyber-attack prediction system combining feature selection(FS)and ML.Our technique’s foundation was based on Correlation Analysis(CA),Mutual Information(MI),and recursive feature reduction with cross-validation.To optimize the IDS performance,the security features must be carefully selected from multiple-dimensional datasets,and our hybrid FS technique must be extended to validate our methodology using the improved UNSW-NB 15 and TON_IoT datasets.Our technique identified 22 key characteristics in UNSW-NB-15 and 8 in TON_IoT.We evaluated prediction using seven ML methods:Decision Tree(DT),Random Forest(RF),Logistic Regression(LR),Naive Bayes(NB),K-Nearest Neighbors(KNN),Support Vector Machines(SVM),and Multilayer Perceptron(MLP)classifiers.The DT,RF,NB,and MLP classifiers helped our model surpass the competition on both datasets.Therefore,the investigational outcomes of our hybrid model may help IDSs defend business assets from various cyberattack vectors.
基金supported by National Natural Science Foundation of China(Grant Nos.62376089,62302153,62302154,62202147)the key Research and Development Program of Hubei Province,China(Grant No.2023BEB024).
文摘The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.
文摘Medical Internet of Things(IoT)devices are becoming more and more common in healthcare.This has created a huge need for advanced predictive health modeling strategies that can make good use of the growing amount of multimodal data to find potential health risks early and help individuals in a personalized way.Existing methods,while useful,have limitations in predictive accuracy,delay,personalization,and user interpretability,requiring a more comprehensive and efficient approach to harness modern medical IoT devices.MAIPFE is a multimodal approach integrating pre-emptive analysis,personalized feature selection,and explainable AI for real-time health monitoring and disease detection.By using AI for early disease detection,personalized health recommendations,and transparency,healthcare will be transformed.The Multimodal Approach Integrating Pre-emptive Analysis,Personalized Feature Selection,and Explainable AI(MAIPFE)framework,which combines Firefly Optimizer,Recurrent Neural Network(RNN),Fuzzy C Means(FCM),and Explainable AI,improves disease detection precision over existing methods.Comprehensive metrics show the model’s superiority in real-time health analysis.The proposed framework outperformed existing models by 8.3%in disease detection classification precision,8.5%in accuracy,5.5%in recall,2.9%in specificity,4.5%in AUC(Area Under the Curve),and 4.9%in delay reduction.Disease prediction precision increased by 4.5%,accuracy by 3.9%,recall by 2.5%,specificity by 3.5%,AUC by 1.9%,and delay levels decreased by 9.4%.MAIPFE can revolutionize healthcare with preemptive analysis,personalized health insights,and actionable recommendations.The research shows that this innovative approach improves patient outcomes and healthcare efficiency in the real world.
基金supported by the Institutional Fund Projects(IFPIP-1481-611-1443)the Key Projects of Natural Science Research in Anhui Higher Education Institutions(2022AH051909)+1 种基金the Provincial Quality Project of Colleges and Universities in Anhui Province(2022sdxx020,2022xqhz044)Bengbu University 2021 High-Level Scientific Research and Cultivation Project(2021pyxm04)。
文摘A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.
文摘High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classification per-formance.However,identifying the optimal features within high-dimensional datasets remains a computationally demanding task,necessitating the use of efficient algorithms.This paper introduces the Arithmetic Optimization Algorithm(AOA),a novel approach for finding the optimal feature subset.AOA is specifically modified to address feature selection problems based on a transfer function.Additionally,two enhancements are incorporated into the AOA algorithm to overcome limitations such as limited precision,slow convergence,and susceptibility to local optima.The first enhancement proposes a new method for selecting solutions to be improved during the search process.This method effectively improves the original algorithm’s accuracy and convergence speed.The second enhancement introduces a local search with neighborhood strategies(AOA_NBH)during the AOA exploitation phase.AOA_NBH explores the vast search space,aiding the algorithm in escaping local optima.Our results demonstrate that incorporating neighborhood methods enhances the output and achieves significant improvement over state-of-the-art methods.
文摘In agriculture sector, machine learning has been widely used by researchers for crop yield prediction. However, it is quite difficult to identify the most critical features from a dataset. Feature selection techniques allow us to remove the extraneous and noisy features from the original feature set. The feature selection techniques help the model to focus only on the important features of the data, thus reducing execution time and improving efficiency of the model. The aim of this study is to determine relevant subset features for achieving high predictive performance by using different feature selection techniques like Filter methods, Wrapper methods and embedded methods. In this work, different feature selection techniques like Rank-based feature selection technique, weighted feature selection technique and Hybrid Feature Selection Technique have been applied to the agricultural data. The optimal feature set returned by different feature selection techniques is used for yield prediction using Linear regression, Random Forest, and Decision Tree Regressor. The accuracy of prediction obtained using the above three methods has been analyzed by using different evaluation parameters. This study helps in increasing predictive accuracy with the minimum number of features.
基金funded by The University of Groningen and Prospect Burma organization.
文摘In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.