In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature sel...In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signatu...With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we...The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.展开更多
A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all...A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.展开更多
In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literat...In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.展开更多
Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires ...Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires careful handling to make quality clinical decisions.Generally,medical data is considered high-dimensional and complex data that contains many irrelevant and redundant features.These factors indirectly upset the disease prediction and classification accuracy of any ML model.To address this issue,various data pre-processing methods called Feature Selection(FS)techniques have been presented in the literature.However,the majority of such techniques frequently suffer from local minima issues due to large solution space.Thus,this study has proposed a novel wrapper-based Sand Cat SwarmOptimization(SCSO)technique as an FS approach to find optimum features from ten benchmark medical datasets.The SCSO algorithm replicates the hunting and searching strategies of the sand cat while having the advantage of avoiding local optima and finding the ideal solution with minimal control variables.Moreover,K-Nearest Neighbor(KNN)classifier was used to evaluate the effectiveness of the features identified by the proposed SCSO algorithm.The performance of the proposed SCSO algorithm was compared with six state-of-the-art and recent wrapper-based optimization algorithms using the validation metrics of classification accuracy,optimum feature size,and computational cost in seconds.The simulation results on the benchmark medical datasets revealed that the proposed SCSO-KNN approach has outperformed comparative algorithms with an average classification accuracy of 93.96%by selecting 14.2 features within 1.91 s.Additionally,the Wilcoxon rank test was used to perform the significance analysis between the proposed SCSOKNN method and six other algorithms for a p-value less than 5.00E-02.The findings revealed that the proposed algorithm produces better outcomes with an average p-value of 1.82E-02.Moreover,potential future directions are also suggested as a result of the study’s promising findings.展开更多
Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collectio...Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collection of features using typical methods.As a result,a new metaheuristicsbased feature selection method based on the dipper-throated and grey-wolf optimization(DTO-GW)algorithms has been developed in this research.Instability can result when the selection of features is subject to metaheuristics,which can lead to a wide range of results.Thus,we adopted hybrid optimization in our method of optimizing,which allowed us to better balance exploration and harvesting chores more equitably.We propose utilizing the binary DTO-GW search approach we previously devised for selecting the optimal subset of attributes.In the proposed method,the number of features selected is minimized,while classification accuracy is increased.To test the proposed method’s performance against eleven other state-of-theart approaches,eight datasets from the UCI repository were used,such as binary grey wolf search(bGWO),binary hybrid grey wolf,and particle swarm optimization(bGWO-PSO),bPSO,binary stochastic fractal search(bSFS),binary whale optimization algorithm(bWOA),binary modified grey wolf optimization(bMGWO),binary multiverse optimization(bMVO),binary bowerbird optimization(bSBO),binary hysteresis optimization(bHy),and binary hysteresis optimization(bHWO).The suggested method is superior 4532 CMC,2023,vol.74,no.2 and successful in handling the problem of feature selection,according to the results of the experiments.展开更多
In real life,a large amount of data describing the same learning task may be stored in different institutions(called participants),and these data cannot be shared among par-ticipants due to privacy protection.The case...In real life,a large amount of data describing the same learning task may be stored in different institutions(called participants),and these data cannot be shared among par-ticipants due to privacy protection.The case that different attributes/features of the same instance are stored in different institutions is called vertically distributed data.The pur-pose of vertical‐federated feature selection(FS)is to reduce the feature dimension of vertical distributed data jointly without sharing local original data so that the feature subset obtained has the same or better performance as the original feature set.To solve this problem,in the paper,an embedded vertical‐federated FS algorithm based on particle swarm optimisation(PSO‐EVFFS)is proposed by incorporating evolutionary FS into the SecureBoost framework for the first time.By optimising both hyper‐parameters of the XGBoost model and feature subsets,PSO‐EVFFS can obtain a feature subset,which makes the XGBoost model more accurate.At the same time,since different participants only share insensitive parameters such as model loss function,PSO‐EVFFS can effec-tively ensure the privacy of participants'data.Moreover,an ensemble ranking strategy of feature importance based on the XGBoost tree model is developed to effectively remove irrelevant features on each participant.Finally,the proposed algorithm is applied to 10 test datasets and compared with three typical vertical‐federated learning frameworks and two variants of the proposed algorithm with different initialisation strategies.Experi-mental results show that the proposed algorithm can significantly improve the classifi-cation performance of selected feature subsets while fully protecting the data privacy of all participants.展开更多
Various feature selection algorithms are usually employed to improve classification models’overall performance.Optimization algorithms typically accompany such algorithms to select the optimal set of features.Among t...Various feature selection algorithms are usually employed to improve classification models’overall performance.Optimization algorithms typically accompany such algorithms to select the optimal set of features.Among the most currently attractive trends within optimization algorithms are hybrid metaheuristics.The present paper presents two Stages of Local Search models for feature selection based on WOA(Whale Optimization Algorithm)and Great Deluge(GD).GD Algorithm is integrated with the WOA algorithm to improve exploitation by identifying the most promising regions during the search.Another version is employed using the best solution found by the WOA algorithm and exploited by the GD algorithm.In addition,disruptive selection(DS)is employed to select the solutions from the population for local search.DS is chosen to maintain the diversity of the population via enhancing low and high-quality solutions.Fifteen(15)standard benchmark datasets provided by the University of California Irvine(UCI)repository were used in evaluating the proposed approaches’performance.Next,a comparison was made with four population-based algorithms as wrapper feature selection methods from the literature.The proposed techniques have proved their efficiency in enhancing classification accuracy compared to other wrapper methods.Hence,the WOA can search effectively in the feature space and choose the most relevant attributes for classification tasks.展开更多
Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and stre...Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.展开更多
An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over...An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.展开更多
The massive influx of traffic on the Internet has made the composition of web traffic increasingly complex.Traditional port-based or protocol-based network traffic identification methods are no longer suitable for to...The massive influx of traffic on the Internet has made the composition of web traffic increasingly complex.Traditional port-based or protocol-based network traffic identification methods are no longer suitable for today’s complex and changing networks.Recently,machine learning has beenwidely applied to network traffic recognition.Still,high-dimensional features and redundant data in network traffic can lead to slow convergence problems and low identification accuracy of network traffic recognition algorithms.Taking advantage of the faster optimizationseeking capability of the jumping spider optimization algorithm(JSOA),this paper proposes a jumping spider optimization algorithmthat incorporates the harris hawk optimization(HHO)and small hole imaging(HHJSOA).We use it in network traffic identification feature selection.First,the method incorporates the HHO escape energy factor and the hard siege strategy to forma newsearch strategy for HHJSOA.This location update strategy enhances the search range of the optimal solution of HHJSOA.We use small hole imaging to update the inferior individual.Next,the feature selection problem is coded to propose a jumping spiders individual coding scheme.Multiple iterations of the HHJSOA algorithmfind the optimal individual used as the selected feature for KNN classification.Finally,we validate the classification accuracy and performance of the HHJSOA algorithm using the UNSW-NB15 dataset and KDD99 dataset.Experimental results show that compared with other algorithms for the UNSW-NB15 dataset,the improvement is at least 0.0705,0.00147,and 1 on the accuracy,fitness value,and the number of features.In addition,compared with other feature selectionmethods for the same datasets,the proposed algorithmhas faster convergence,better merit-seeking,and robustness.Therefore,HHJSOAcan improve the classification accuracy and solve the problem that the network traffic recognition algorithm needs to be faster to converge and easily fall into local optimum due to high-dimensional features.展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human...Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human participation in target recognition processes.This paper uses the particle swarm optimization(PSO)algorithm to select the optimal features in the micro-Doppler signature of sonar targets.The microDoppler effect is referred to amplitude/phase modulation on the received signal by rotating parts of a target such as propellers.Since different targets'geometric and physical properties are not the same,their micro-Doppler signature is different.This Inconsistency can be considered a practical issue(especially in the frequency domain)for sonar target recognition.Despite using 128-point fast Fourier transform(FFT)for the feature extraction step,not all extracted features contain helpful information.As a result,PSO selects the most optimum and valuable features.To evaluate the micro-Doppler signature of sonar targets and the effect of feature selection on sonar target recognition,the simplest and most popular machine learning algorithm,k-nearest neighbor(k-NN),is used,which is called k-PSO in this paper because of the use of PSO for feature selection.The parameters measured are the correct recognition rate,reliability rate,and processing time.The simulation results show that k-PSO achieved a 100%correct recognition rate and reliability rate at 19.35 s when using simulated data at a 15 dB signal-tonoise ratio(SNR)angle of 40°.Also,for the experimental dataset obtained from the cavitation tunnel,the correct recognition rate is 98.26%,and the reliability rate is 99.69%at 18.46s.Therefore,the k-PSO has an encouraging performance in automatically recognizing sonar targets when using experimental datasets and for real-world use.展开更多
Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remain...Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remains a challenging task under diverse walking sequences due to the covariant factors such as normal walking and walking with wearing a coat.Researchers,over the years,have worked on successfully identifying subjects using different techniques,but there is still room for improvement in accuracy due to these covariant factors.This paper proposes an automated model-free framework for human gait recognition in this article.There are a few critical steps in the proposed method.Firstly,optical flow-based motion region esti-mation and dynamic coordinates-based cropping are performed.The second step involves training a fine-tuned pre-trained MobileNetV2 model on both original and optical flow cropped frames;the training has been conducted using static hyperparameters.The third step proposed a fusion technique known as normal distribution serially fusion.In the fourth step,a better optimization algorithm is applied to select the best features,which are then classified using a Bi-Layered neural network.Three publicly available datasets,CASIA A,CASIA B,and CASIA C,were used in the experimental process and obtained average accuracies of 99.6%,91.6%,and 95.02%,respectively.The proposed framework has achieved improved accuracy compared to the other methods.展开更多
Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in ...Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in real-time during the tunnel construction process to match and adjust the tunnel parameters according to the geological conditions to ensure construction safety. Compared with the traditional method of stratum identifcation based on staged drilling sampling, the real-time stratum identifcation method based on construction data has the advantages of low cost and high precision. Due to the huge amount of sensor data of the ultra-large diameter mud-water balance shield machine, in order to balance the identifcation time and recognition accuracy of the formation, it is necessary to screen the multivariate data features collected by hundreds of sensors. In response to this problem, this paper proposes a voting-based feature extraction method (VFS), which integrates multiple feature extraction algorithms FSM, and the frequency of each feature in all feature extraction algorithms is the basis for voting. At the same time, in order to verify the wide applicability of the method, several commonly used classifcation models are used to train and test the obtained efective feature data, and the model accuracy and recognition time are used as evaluation indicators, and the classifcation with the best combination with VFS is obtained. The experimental results of shield machine data of 6 diferent geological structures show that the average accuracy of 13 features obtained by VFS combined with diferent classifcation algorithms is 91%;among them, the random forest model takes less time and has the highest recognition accuracy, reaching 93%, showing best compatibility with VFS. Therefore, the VFS algorithm proposed in this paper has high reliability and wide applicability for stratum identifcation in the process of tunnel construction, and can be matched with a variety of classifer algorithms. By combining 13 features selected from shield machine data features with random forest, the identifcation of the construction stratum environment of shield tunnels can be well realized, and further theoretical guidance for underground engineering construction can be provided.展开更多
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu...As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.展开更多
Graphs help to define the relationships between entities in the data.These relationships,represented by edges,often provide additional context information which can be utilised to discover patterns in the data.Graph N...Graphs help to define the relationships between entities in the data.These relationships,represented by edges,often provide additional context information which can be utilised to discover patterns in the data.Graph Neural Networks(GNNs)employ the inductive bias of the graph structure to learn and predict on various tasks.The primary operation of graph neural networks is the feature aggregation step performed over neighbours of the node based on the structure of the graph.In addition to its own features,for each hop,the node gets additional combined features from its neighbours.These aggregated features help define the similarity or dissimilarity of the nodes with respect to the labels and are useful for tasks like node classification.However,in real-world data,features of neighbours at different hops may not correlate with the node's features.Thus,any indiscriminate feature aggregation by GNN might cause the addition of noisy features leading to degradation in model's performance.In this work,we show that selective aggregation of node features from various hops leads to better performance than default aggregation on the node classification task.Furthermore,we propose a Dual-Net GNN architecture with a classifier model and a selector model.The classifier model trains over a subset of input node features to predict node labels while the selector model learns to provide optimal input subset to the classifier for the best performance.These two models are trained jointly to learn the best subset of features that give higher accuracy in node label predictions.With extensive experiments,we show that our proposed model outperforms both feature selection methods and state-of-the-art GNN models with remarkable improvements up to 27.8%.展开更多
基金supported in part by the Natural Science Youth Foundation of Hebei Province under Grant F2019403207in part by the PhD Research Startup Foundation of Hebei GEO University under Grant BQ2019055+3 种基金in part by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant KLIGIP-2021A06in part by the Fundamental Research Funds for the Universities in Hebei Province under Grant QN202220in part by the Science and Technology Research Project for Universities of Hebei under Grant ZD2020344in part by the Guangxi Natural Science Fund General Project under Grant 2021GXNSFAA075029.
文摘In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
基金supported by the Korea Institute for Advancement of Technology(KIAT)Grant funded by theKorean Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialists)MSIT under the ICAN(ICT Challenge and Advanced Network of HRD)Program(No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning and Evaluation(IITP).
文摘With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant no.2019QZKK0904)Natural Science Foundation of Hebei Province(Grant no.D2022403032)S&T Program of Hebei(Grant no.E2021403001).
文摘The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.
基金supported by the Institutional Fund Projects(IFPIP-1481-611-1443)the Key Projects of Natural Science Research in Anhui Higher Education Institutions(2022AH051909)+1 种基金the Provincial Quality Project of Colleges and Universities in Anhui Province(2022sdxx020,2022xqhz044)Bengbu University 2021 High-Level Scientific Research and Cultivation Project(2021pyxm04)。
文摘A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.
基金funded by The University of Groningen and Prospect Burma organization.
文摘In stock market forecasting,the identification of critical features that affect the performance of machine learning(ML)models is crucial to achieve accurate stock price predictions.Several review papers in the literature have focused on various ML,statistical,and deep learning-based methods used in stock market forecasting.However,no survey study has explored feature selection and extraction techniques for stock market forecasting.This survey presents a detailed analysis of 32 research works that use a combination of feature study and ML approaches in various stock market applications.We conduct a systematic search for articles in the Scopus and Web of Science databases for the years 2011–2022.We review a variety of feature selection and feature extraction approaches that have been successfully applied in the stock market analyses presented in the articles.We also describe the combination of feature analysis techniques and ML methods and evaluate their performance.Moreover,we present other survey articles,stock market input and output data,and analyses based on various factors.We find that correlation criteria,random forest,principal component analysis,and autoencoder are the most widely used feature selection and extraction techniques with the best prediction accuracy for various stock market applications.
基金This research was supported by a Researchers Supporting Project Number(RSP2021/309)King Saud University,Riyadh,Saudi Arabia.The authors wish to acknowledge Yayasan Universiti Teknologi Petronas for supporting this work through the research grant(015LC0-308).
文摘Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires careful handling to make quality clinical decisions.Generally,medical data is considered high-dimensional and complex data that contains many irrelevant and redundant features.These factors indirectly upset the disease prediction and classification accuracy of any ML model.To address this issue,various data pre-processing methods called Feature Selection(FS)techniques have been presented in the literature.However,the majority of such techniques frequently suffer from local minima issues due to large solution space.Thus,this study has proposed a novel wrapper-based Sand Cat SwarmOptimization(SCSO)technique as an FS approach to find optimum features from ten benchmark medical datasets.The SCSO algorithm replicates the hunting and searching strategies of the sand cat while having the advantage of avoiding local optima and finding the ideal solution with minimal control variables.Moreover,K-Nearest Neighbor(KNN)classifier was used to evaluate the effectiveness of the features identified by the proposed SCSO algorithm.The performance of the proposed SCSO algorithm was compared with six state-of-the-art and recent wrapper-based optimization algorithms using the validation metrics of classification accuracy,optimum feature size,and computational cost in seconds.The simulation results on the benchmark medical datasets revealed that the proposed SCSO-KNN approach has outperformed comparative algorithms with an average classification accuracy of 93.96%by selecting 14.2 features within 1.91 s.Additionally,the Wilcoxon rank test was used to perform the significance analysis between the proposed SCSOKNN method and six other algorithms for a p-value less than 5.00E-02.The findings revealed that the proposed algorithm produces better outcomes with an average p-value of 1.82E-02.Moreover,potential future directions are also suggested as a result of the study’s promising findings.
文摘Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collection of features using typical methods.As a result,a new metaheuristicsbased feature selection method based on the dipper-throated and grey-wolf optimization(DTO-GW)algorithms has been developed in this research.Instability can result when the selection of features is subject to metaheuristics,which can lead to a wide range of results.Thus,we adopted hybrid optimization in our method of optimizing,which allowed us to better balance exploration and harvesting chores more equitably.We propose utilizing the binary DTO-GW search approach we previously devised for selecting the optimal subset of attributes.In the proposed method,the number of features selected is minimized,while classification accuracy is increased.To test the proposed method’s performance against eleven other state-of-theart approaches,eight datasets from the UCI repository were used,such as binary grey wolf search(bGWO),binary hybrid grey wolf,and particle swarm optimization(bGWO-PSO),bPSO,binary stochastic fractal search(bSFS),binary whale optimization algorithm(bWOA),binary modified grey wolf optimization(bMGWO),binary multiverse optimization(bMVO),binary bowerbird optimization(bSBO),binary hysteresis optimization(bHy),and binary hysteresis optimization(bHWO).The suggested method is superior 4532 CMC,2023,vol.74,no.2 and successful in handling the problem of feature selection,according to the results of the experiments.
基金supported by the two funding sources:Scientific Innovation 2030 Major Project for New Generation of AI,Ministry of Science and Technology of the Peoples Republic of China(2020AAA0107300)National Natural Science Foundation of China(62133015).
文摘In real life,a large amount of data describing the same learning task may be stored in different institutions(called participants),and these data cannot be shared among par-ticipants due to privacy protection.The case that different attributes/features of the same instance are stored in different institutions is called vertically distributed data.The pur-pose of vertical‐federated feature selection(FS)is to reduce the feature dimension of vertical distributed data jointly without sharing local original data so that the feature subset obtained has the same or better performance as the original feature set.To solve this problem,in the paper,an embedded vertical‐federated FS algorithm based on particle swarm optimisation(PSO‐EVFFS)is proposed by incorporating evolutionary FS into the SecureBoost framework for the first time.By optimising both hyper‐parameters of the XGBoost model and feature subsets,PSO‐EVFFS can obtain a feature subset,which makes the XGBoost model more accurate.At the same time,since different participants only share insensitive parameters such as model loss function,PSO‐EVFFS can effec-tively ensure the privacy of participants'data.Moreover,an ensemble ranking strategy of feature importance based on the XGBoost tree model is developed to effectively remove irrelevant features on each participant.Finally,the proposed algorithm is applied to 10 test datasets and compared with three typical vertical‐federated learning frameworks and two variants of the proposed algorithm with different initialisation strategies.Experi-mental results show that the proposed algorithm can significantly improve the classifi-cation performance of selected feature subsets while fully protecting the data privacy of all participants.
基金This research is part of a project funded by Imam Abdulrahman Bin Faisal University,under Grant Number 2020-083-BASRC.
文摘Various feature selection algorithms are usually employed to improve classification models’overall performance.Optimization algorithms typically accompany such algorithms to select the optimal set of features.Among the most currently attractive trends within optimization algorithms are hybrid metaheuristics.The present paper presents two Stages of Local Search models for feature selection based on WOA(Whale Optimization Algorithm)and Great Deluge(GD).GD Algorithm is integrated with the WOA algorithm to improve exploitation by identifying the most promising regions during the search.Another version is employed using the best solution found by the WOA algorithm and exploited by the GD algorithm.In addition,disruptive selection(DS)is employed to select the solutions from the population for local search.DS is chosen to maintain the diversity of the population via enhancing low and high-quality solutions.Fifteen(15)standard benchmark datasets provided by the University of California Irvine(UCI)repository were used in evaluating the proposed approaches’performance.Next,a comparison was made with four population-based algorithms as wrapper feature selection methods from the literature.The proposed techniques have proved their efficiency in enhancing classification accuracy compared to other wrapper methods.Hence,the WOA can search effectively in the feature space and choose the most relevant attributes for classification tasks.
文摘Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.
文摘An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.
基金funded by the National Natural Science Foundation of China under Grant No.61602162.
文摘The massive influx of traffic on the Internet has made the composition of web traffic increasingly complex.Traditional port-based or protocol-based network traffic identification methods are no longer suitable for today’s complex and changing networks.Recently,machine learning has beenwidely applied to network traffic recognition.Still,high-dimensional features and redundant data in network traffic can lead to slow convergence problems and low identification accuracy of network traffic recognition algorithms.Taking advantage of the faster optimizationseeking capability of the jumping spider optimization algorithm(JSOA),this paper proposes a jumping spider optimization algorithmthat incorporates the harris hawk optimization(HHO)and small hole imaging(HHJSOA).We use it in network traffic identification feature selection.First,the method incorporates the HHO escape energy factor and the hard siege strategy to forma newsearch strategy for HHJSOA.This location update strategy enhances the search range of the optimal solution of HHJSOA.We use small hole imaging to update the inferior individual.Next,the feature selection problem is coded to propose a jumping spiders individual coding scheme.Multiple iterations of the HHJSOA algorithmfind the optimal individual used as the selected feature for KNN classification.Finally,we validate the classification accuracy and performance of the HHJSOA algorithm using the UNSW-NB15 dataset and KDD99 dataset.Experimental results show that compared with other algorithms for the UNSW-NB15 dataset,the improvement is at least 0.0705,0.00147,and 1 on the accuracy,fitness value,and the number of features.In addition,compared with other feature selectionmethods for the same datasets,the proposed algorithmhas faster convergence,better merit-seeking,and robustness.Therefore,HHJSOAcan improve the classification accuracy and solve the problem that the network traffic recognition algorithm needs to be faster to converge and easily fall into local optimum due to high-dimensional features.
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human participation in target recognition processes.This paper uses the particle swarm optimization(PSO)algorithm to select the optimal features in the micro-Doppler signature of sonar targets.The microDoppler effect is referred to amplitude/phase modulation on the received signal by rotating parts of a target such as propellers.Since different targets'geometric and physical properties are not the same,their micro-Doppler signature is different.This Inconsistency can be considered a practical issue(especially in the frequency domain)for sonar target recognition.Despite using 128-point fast Fourier transform(FFT)for the feature extraction step,not all extracted features contain helpful information.As a result,PSO selects the most optimum and valuable features.To evaluate the micro-Doppler signature of sonar targets and the effect of feature selection on sonar target recognition,the simplest and most popular machine learning algorithm,k-nearest neighbor(k-NN),is used,which is called k-PSO in this paper because of the use of PSO for feature selection.The parameters measured are the correct recognition rate,reliability rate,and processing time.The simulation results show that k-PSO achieved a 100%correct recognition rate and reliability rate at 19.35 s when using simulated data at a 15 dB signal-tonoise ratio(SNR)angle of 40°.Also,for the experimental dataset obtained from the cavitation tunnel,the correct recognition rate is 98.26%,and the reliability rate is 99.69%at 18.46s.Therefore,the k-PSO has an encouraging performance in automatically recognizing sonar targets when using experimental datasets and for real-world use.
基金supported by“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)granted financial resources from the Ministry of Trade,Industry&Energy,Republic of Korea.(No.20204010600090).
文摘Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remains a challenging task under diverse walking sequences due to the covariant factors such as normal walking and walking with wearing a coat.Researchers,over the years,have worked on successfully identifying subjects using different techniques,but there is still room for improvement in accuracy due to these covariant factors.This paper proposes an automated model-free framework for human gait recognition in this article.There are a few critical steps in the proposed method.Firstly,optical flow-based motion region esti-mation and dynamic coordinates-based cropping are performed.The second step involves training a fine-tuned pre-trained MobileNetV2 model on both original and optical flow cropped frames;the training has been conducted using static hyperparameters.The third step proposed a fusion technique known as normal distribution serially fusion.In the fourth step,a better optimization algorithm is applied to select the best features,which are then classified using a Bi-Layered neural network.Three publicly available datasets,CASIA A,CASIA B,and CASIA C,were used in the experimental process and obtained average accuracies of 99.6%,91.6%,and 95.02%,respectively.The proposed framework has achieved improved accuracy compared to the other methods.
基金Supported by National Natural Science Foundation of China and Shanxi Coalbased Low Carbon Joint Fund(Grant No.U1910211)National Natural Science Foundation of China(Grant Nos.51975024 and 52105044)National Key Research and Development Project(Grant No.2019YFC0121700).
文摘Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in real-time during the tunnel construction process to match and adjust the tunnel parameters according to the geological conditions to ensure construction safety. Compared with the traditional method of stratum identifcation based on staged drilling sampling, the real-time stratum identifcation method based on construction data has the advantages of low cost and high precision. Due to the huge amount of sensor data of the ultra-large diameter mud-water balance shield machine, in order to balance the identifcation time and recognition accuracy of the formation, it is necessary to screen the multivariate data features collected by hundreds of sensors. In response to this problem, this paper proposes a voting-based feature extraction method (VFS), which integrates multiple feature extraction algorithms FSM, and the frequency of each feature in all feature extraction algorithms is the basis for voting. At the same time, in order to verify the wide applicability of the method, several commonly used classifcation models are used to train and test the obtained efective feature data, and the model accuracy and recognition time are used as evaluation indicators, and the classifcation with the best combination with VFS is obtained. The experimental results of shield machine data of 6 diferent geological structures show that the average accuracy of 13 features obtained by VFS combined with diferent classifcation algorithms is 91%;among them, the random forest model takes less time and has the highest recognition accuracy, reaching 93%, showing best compatibility with VFS. Therefore, the VFS algorithm proposed in this paper has high reliability and wide applicability for stratum identifcation in the process of tunnel construction, and can be matched with a variety of classifer algorithms. By combining 13 features selected from shield machine data features with random forest, the identifcation of the construction stratum environment of shield tunnels can be well realized, and further theoretical guidance for underground engineering construction can be provided.
基金supported in part by the National Natural Science Foundation of China(62172065,62072060)。
文摘As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
基金New Energy and Industrial Technology Development Organization,Grant/Award Number:JPNP20006JSPS Grant-in-Aid for Scientific Research,Grant/Award Numbers:21K12042,17H01785。
文摘Graphs help to define the relationships between entities in the data.These relationships,represented by edges,often provide additional context information which can be utilised to discover patterns in the data.Graph Neural Networks(GNNs)employ the inductive bias of the graph structure to learn and predict on various tasks.The primary operation of graph neural networks is the feature aggregation step performed over neighbours of the node based on the structure of the graph.In addition to its own features,for each hop,the node gets additional combined features from its neighbours.These aggregated features help define the similarity or dissimilarity of the nodes with respect to the labels and are useful for tasks like node classification.However,in real-world data,features of neighbours at different hops may not correlate with the node's features.Thus,any indiscriminate feature aggregation by GNN might cause the addition of noisy features leading to degradation in model's performance.In this work,we show that selective aggregation of node features from various hops leads to better performance than default aggregation on the node classification task.Furthermore,we propose a Dual-Net GNN architecture with a classifier model and a selector model.The classifier model trains over a subset of input node features to predict node labels while the selector model learns to provide optimal input subset to the classifier for the best performance.These two models are trained jointly to learn the best subset of features that give higher accuracy in node label predictions.With extensive experiments,we show that our proposed model outperforms both feature selection methods and state-of-the-art GNN models with remarkable improvements up to 27.8%.