In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classif...In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.展开更多
The rapid expansion of artificial intelligence(AI)applications has raised significant concerns about user privacy,prompting the development of privacy-preserving machine learning(ML)paradigms such as federated learnin...The rapid expansion of artificial intelligence(AI)applications has raised significant concerns about user privacy,prompting the development of privacy-preserving machine learning(ML)paradigms such as federated learning(FL).FL enables the distributed training of ML models,keeping data on local devices and thus addressing the privacy concerns of users.However,challenges arise from the heterogeneous nature of mobile client devices,partial engagement of training,and non-independent identically distributed(non-IID)data distribution,leading to performance degradation and optimization objective bias in FL training.With the development of 5G/6G networks and the integration of cloud computing edge computing resources,globally distributed cloud computing resources can be effectively utilized to optimize the FL process.Through the specific parameters of the server through the selection mechanism,it does not increase the monetary cost and reduces the network latency overhead,but also balances the objectives of communication optimization and low engagement mitigation that cannot be achieved simultaneously in a single-server framework of existing works.In this paper,we propose the FedAdaSS algorithm,an adaptive parameter server selection mechanism designed to optimize the training efficiency in each round of FL training by selecting the most appropriate server as the parameter server.Our approach leverages the flexibility of cloud resource computing power,and allows organizers to strategically select servers for data broadcasting and aggregation,thus improving training performance while maintaining cost efficiency.The FedAdaSS algorithm estimates the utility of client systems and servers and incorporates an adaptive random reshuffling strategy that selects the optimal server in each round of the training process.Theoretical analysis confirms the convergence of FedAdaSS under strong convexity and L-smooth assumptions,and comparative experiments within the FLSim framework demonstrate a reduction in training round-to-accuracy by 12%–20%compared to the Federated Averaging(FedAvg)with random reshuffling method under unique server.Furthermore,FedAdaSS effectively mitigates performance loss caused by low client engagement,reducing the loss indicator by 50%.展开更多
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi...The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
Using multi-source data like remote sensing images,resource point coordinates,road networks and land type,a suitability assessment system for red study greenway route selection is constructed with red study resource l...Using multi-source data like remote sensing images,resource point coordinates,road networks and land type,a suitability assessment system for red study greenway route selection is constructed with red study resource layer,traffic condition layer,ecological condition layer,and service radius layer as the selection elements.Analytic Hierarchy Process(AHP)and Delphi method aree used to determine the selection factors and weight allocation of each element,and the single factor evaluation and multi factor overlaying analysis are used to accurately identify suitable selection corridors.The potential position is determined based on the lowest cost path.Finally,the final red study greenway is obtained through manual optimization based on the current situation of the road network.The analysis results show that the areas with the highest suitability were those with the richest distribution of first,second,and third level red study resources.An effective connection of red resource points and various elements helps to enhance the competitiveness of red study tours in Ji’an County,providing a realistic path for selecting study greenways for cities rich in red resources.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approach...Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches.展开更多
Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempt...Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempting to maintain discriminative features in processed data.The existing scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.Recently ensemble methods have made a mark in classification tasks as combine multiple results into a single representation.When comparing to a single model,this technique offers for improved prediction.Ensemble based feature selections parallel multiple expert’s judgments on a single topic.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.Further,individual outputs produced by methods producing subsets of features or rankings or voting are also combined in this work.KNN(K-Nearest Neighbor)classifier is used to classify the big dataset obtained from the ensemble learning approach.The results found of the study have been good,proving the proposed model’s efficiency in classifications in terms of the performance metrics like precision,recall,F-measure and accuracy used.展开更多
Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires ...Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires careful handling to make quality clinical decisions.Generally,medical data is considered high-dimensional and complex data that contains many irrelevant and redundant features.These factors indirectly upset the disease prediction and classification accuracy of any ML model.To address this issue,various data pre-processing methods called Feature Selection(FS)techniques have been presented in the literature.However,the majority of such techniques frequently suffer from local minima issues due to large solution space.Thus,this study has proposed a novel wrapper-based Sand Cat SwarmOptimization(SCSO)technique as an FS approach to find optimum features from ten benchmark medical datasets.The SCSO algorithm replicates the hunting and searching strategies of the sand cat while having the advantage of avoiding local optima and finding the ideal solution with minimal control variables.Moreover,K-Nearest Neighbor(KNN)classifier was used to evaluate the effectiveness of the features identified by the proposed SCSO algorithm.The performance of the proposed SCSO algorithm was compared with six state-of-the-art and recent wrapper-based optimization algorithms using the validation metrics of classification accuracy,optimum feature size,and computational cost in seconds.The simulation results on the benchmark medical datasets revealed that the proposed SCSO-KNN approach has outperformed comparative algorithms with an average classification accuracy of 93.96%by selecting 14.2 features within 1.91 s.Additionally,the Wilcoxon rank test was used to perform the significance analysis between the proposed SCSOKNN method and six other algorithms for a p-value less than 5.00E-02.The findings revealed that the proposed algorithm produces better outcomes with an average p-value of 1.82E-02.Moreover,potential future directions are also suggested as a result of the study’s promising findings.展开更多
Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and stre...Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.展开更多
This paper proposes Parallelized Linear Time-Variant Acceleration Coefficients and Inertial Weight of Particle Swarm Optimization algorithm(PLTVACIW-PSO).Its designed has introduced the benefits of Parallel computing ...This paper proposes Parallelized Linear Time-Variant Acceleration Coefficients and Inertial Weight of Particle Swarm Optimization algorithm(PLTVACIW-PSO).Its designed has introduced the benefits of Parallel computing into the combined power of TVAC(Time-Variant Acceleration Coefficients)and IW(Inertial Weight).Proposed algorithm has been tested against linear,non-linear,traditional,andmultiswarmbased optimization algorithms.An experimental study is performed in two stages to assess the proposed PLTVACIW-PSO.Phase I uses 12 recognized Standard Benchmarks methods to evaluate the comparative performance of the proposed PLTVACIWPSO vs.IW based Particle Swarm Optimization(PSO)algorithms,TVAC based PSO algorithms,traditional PSO,Genetic algorithms(GA),Differential evolution(DE),and,finally,Flower Pollination(FP)algorithms.In phase II,the proposed PLTVACIW-PSO uses the same 12 known Benchmark functions to test its performance against the BAT(BA)and Multi-Swarm BAT algorithms.In phase III,the proposed PLTVACIW-PSO is employed to augment the feature selection problem formedical datasets.This experimental study shows that the planned PLTVACIW-PSO outpaces the performances of other comparable algorithms.Outcomes from the experiments shows that the PLTVACIW-PSO is capable of outlining a feature subset that is capable of enhancing the classification efficiency and gives the minimal subset of the core features.展开更多
Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metabolomics is...Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metabolomics is a better avenue to understand the inner mechanism of it. Feature selection based data mining method is better suited to identify biomarkers of UA. In this study, we carried out clinical epidemiology to collect plasmas of UA in-patients and controls. Proteomics and metabolomics data were obtained via two-dimensional difference gel electrophoresis and gas chromatography techniques. We presented a novel computational strategy to select biomarkers as few as possible for UA in the two groups of data. Firstly, decision tree was used to select biomarkers for UA and 3-fold cross validation was used to evaluate computational performanees for the three methods. Alternatively, we combined inde- pendent t test and classification based data mining method as well as backward elimination technique to select, as few as possible, protein and metabolite biomarkers with best classification performances. By the method, we selected 6 proteins and 5 metabolites for UA. The novel method presented here provides a better insight into the pathology of a disease.展开更多
This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected featu...This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.展开更多
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority cl...Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.展开更多
In the area of pattern recognition and machine learning,features play a key role in prediction.The famous applications of features are medical imaging,image classification,and name a few more.With the exponential grow...In the area of pattern recognition and machine learning,features play a key role in prediction.The famous applications of features are medical imaging,image classification,and name a few more.With the exponential growth of information investments in medical data repositories and health service provision,medical institutions are collecting large volumes of data.These data repositories contain details information essential to support medical diagnostic decisions and also improve patient care quality.On the other hand,this growth also made it difficult to comprehend and utilize data for various purposes.The results of imaging data can become biased because of extraneous features present in larger datasets.Feature selection gives a chance to decrease the number of components in such large datasets.Through selection techniques,ousting the unimportant features and selecting a subset of components that produces prevalent characterization precision.The correct decision to find a good attribute produces a precise grouping model,which enhances learning pace and forecast control.This paper presents a review of feature selection techniques and attributes selection measures for medical imaging.This review is meant to describe feature selection techniques in a medical domainwith their pros and cons and to signify its application in imaging data and data mining algorithms.The review reveals the shortcomings of the existing feature and attributes selection techniques to multi-sourced data.Moreover,this review provides the importance of feature selection for correct classification of medical infections.In the end,critical analysis and future directions are provided.展开更多
Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while r...Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while requiring minimal agricultural inputs.However,accurately identifying ratoon rice crops is challenging due to the similarity of its spectral features with other rice cropping systems(e.g.,double rice).Moreover,images with a high spatiotemporal resolution are essential since ratoon rice is generally cultivated in fragmented croplands within regions that frequently exhibit cloudy and rainy weather.In this study,taking Qichun County in Hubei Province,China as an example,we developed a new phenology-based ratoon rice vegetation index(PRVI)for the purpose of ratoon rice mapping at a 30 m spatial resolution using a robust time series generated from Harmonized Landsat and Sentinel-2(HLS)images.The PRVI that incorporated the red,near-infrared,and shortwave infrared 1 bands was developed based on the analysis of spectro-phenological separability and feature selection.Based on actual field samples,the performance of the PRVI for ratoon rice mapping was carefully evaluated by comparing it to several vegetation indices,including normalized difference vegetation index(NDVI),enhanced vegetation index(EVI)and land surface water index(LSWI).The results suggested that the PRVI could sufficiently capture the specific characteristics of ratoon rice,leading to a favorable separability between ratoon rice and other land cover types.Furthermore,the PRVI showed the best performance for identifying ratoon rice in the phenological phases characterized by grain filling and harvesting to tillering of the ratoon crop(GHS-TS2),indicating that only several images are required to obtain an accurate ratoon rice map.Finally,the PRVI performed better than NDVI,EVI,LSWI and their combination at the GHS-TS2 stages,with producer's accuracy and user's accuracy of 92.22 and 89.30%,respectively.These results demonstrate that the proposed PRVI based on HLS data can effectively identify ratoon rice in fragmented croplands at crucial phenological stages,which is promising for identifying the earliest timing of ratoon rice planting and can provide a fundamental dataset for crop management activities.展开更多
Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine ...Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.展开更多
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu...As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.展开更多
Principal component analysis (PCA) combined with artificial neural networks was used to classify the spectra of 27 steel samples acquired using laser-induced breakdown spectroscopy. Three methods of spectral data se...Principal component analysis (PCA) combined with artificial neural networks was used to classify the spectra of 27 steel samples acquired using laser-induced breakdown spectroscopy. Three methods of spectral data selection, selecting all the peak lines of the spectra, selecting intensive spectral partitions and the whole spectra, were utilized to compare the infiuence of different inputs of PCA on the classification of steels. Three intensive partitions were selected based on experience and prior knowledge to compare the classification, as the partitions can obtain the best results compared to all peak lines and the whole spectra. We also used two test data sets, mean spectra after being averaged and raw spectra without any pretreatment, to verify the results of the classification. The results of this comprehensive comparison show that a back propagation network trained using the principal components of appropriate, carefully selecred spectral partitions can obtain the best results accuracy can be achieved using the intensive spectral A perfect result with 100% classification partitions ranging of 357-367 nm.展开更多
Due to exponential increase in smart resource limited devices and high speed communication technologies,Internet of Things(IoT)have received significant attention in different application areas.However,IoT environment...Due to exponential increase in smart resource limited devices and high speed communication technologies,Internet of Things(IoT)have received significant attention in different application areas.However,IoT environment is highly susceptible to cyber-attacks because of memory,processing,and communication restrictions.Since traditional models are not adequate for accomplishing security in the IoT environment,the recent developments of deep learning(DL)models find beneficial.This study introduces novel hybrid metaheuristics feature selection with stacked deep learning enabled cyber-attack detection(HMFS-SDLCAD)model.The major intention of the HMFS-SDLCAD model is to recognize the occurrence of cyberattacks in the IoT environment.At the preliminary stage,data pre-processing is carried out to transform the input data into useful format.In addition,salp swarm optimization based on particle swarm optimization(SSOPSO)algorithm is used for feature selection process.Besides,stacked bidirectional gated recurrent unit(SBiGRU)model is utilized for the identification and classification of cyberattacks.Finally,whale optimization algorithm(WOA)is employed for optimal hyperparameter optimization process.The experimental analysis of the HMFS-SDLCAD model is validated using benchmark dataset and the results are assessed under several aspects.The simulation outcomes pointed out the improvements of the HMFS-SDLCAD model over recent approaches.展开更多
基金supported by National Natural Science Foundation of China(62371098)Natural Science Foundation of Sichuan Province(2023NSFSC1422)+1 种基金National Key Research and Development Program of China(2021YFB2900404)Central Universities of South west Minzu University(ZYN2022032).
文摘In recent years,deep learning-based signal recognition technology has gained attention and emerged as an important approach for safeguarding the electromagnetic environment.However,training deep learning-based classifiers on large signal datasets with redundant samples requires significant memory and high costs.This paper proposes a support databased core-set selection method(SD)for signal recognition,aiming to screen a representative subset that approximates the large signal dataset.Specifically,this subset can be identified by employing the labeled information during the early stages of model training,as some training samples are labeled as supporting data frequently.This support data is crucial for model training and can be found using a border sample selector.Simulation results demonstrate that the SD method minimizes the impact on model recognition performance while reducing the dataset size,and outperforms five other state-of-the-art core-set selection methods when the fraction of training sample kept is less than or equal to 0.3 on the RML2016.04C dataset or 0.5 on the RML22 dataset.The SD method is particularly helpful for signal recognition tasks with limited memory and computing resources.
基金supported in part by the National Natural Science Foundation of China under Grant U22B2005,Grant 62372462.
文摘The rapid expansion of artificial intelligence(AI)applications has raised significant concerns about user privacy,prompting the development of privacy-preserving machine learning(ML)paradigms such as federated learning(FL).FL enables the distributed training of ML models,keeping data on local devices and thus addressing the privacy concerns of users.However,challenges arise from the heterogeneous nature of mobile client devices,partial engagement of training,and non-independent identically distributed(non-IID)data distribution,leading to performance degradation and optimization objective bias in FL training.With the development of 5G/6G networks and the integration of cloud computing edge computing resources,globally distributed cloud computing resources can be effectively utilized to optimize the FL process.Through the specific parameters of the server through the selection mechanism,it does not increase the monetary cost and reduces the network latency overhead,but also balances the objectives of communication optimization and low engagement mitigation that cannot be achieved simultaneously in a single-server framework of existing works.In this paper,we propose the FedAdaSS algorithm,an adaptive parameter server selection mechanism designed to optimize the training efficiency in each round of FL training by selecting the most appropriate server as the parameter server.Our approach leverages the flexibility of cloud resource computing power,and allows organizers to strategically select servers for data broadcasting and aggregation,thus improving training performance while maintaining cost efficiency.The FedAdaSS algorithm estimates the utility of client systems and servers and incorporates an adaptive random reshuffling strategy that selects the optimal server in each round of the training process.Theoretical analysis confirms the convergence of FedAdaSS under strong convexity and L-smooth assumptions,and comparative experiments within the FLSim framework demonstrate a reduction in training round-to-accuracy by 12%–20%compared to the Federated Averaging(FedAvg)with random reshuffling method under unique server.Furthermore,FedAdaSS effectively mitigates performance loss caused by low client engagement,reducing the loss indicator by 50%.
基金supported by National Natural Science Foundation of China(Grant Nos.62376089,62302153,62302154,62202147)the key Research and Development Program of Hubei Province,China(Grant No.2023BEB024).
文摘The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
文摘Using multi-source data like remote sensing images,resource point coordinates,road networks and land type,a suitability assessment system for red study greenway route selection is constructed with red study resource layer,traffic condition layer,ecological condition layer,and service radius layer as the selection elements.Analytic Hierarchy Process(AHP)and Delphi method aree used to determine the selection factors and weight allocation of each element,and the single factor evaluation and multi factor overlaying analysis are used to accurately identify suitable selection corridors.The potential position is determined based on the lowest cost path.Finally,the final red study greenway is obtained through manual optimization based on the current situation of the road network.The analysis results show that the areas with the highest suitability were those with the richest distribution of first,second,and third level red study resources.An effective connection of red resource points and various elements helps to enhance the competitiveness of red study tours in Ji’an County,providing a realistic path for selecting study greenways for cities rich in red resources.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
文摘Due to the advancements in information technologies,massive quantity of data is being produced by social media,smartphones,and sensor devices.The investigation of data stream by the use of machine learning(ML)approaches to address regression,prediction,and classification problems have received consid-erable interest.At the same time,the detection of anomalies or outliers and feature selection(FS)processes becomes important.This study develops an outlier detec-tion with feature selection technique for streaming data classification,named ODFST-SDC technique.Initially,streaming data is pre-processed in two ways namely categorical encoding and null value removal.In addition,Local Correla-tion Integral(LOCI)is used which is significant in the detection and removal of outliers.Besides,red deer algorithm(RDA)based FS approach is employed to derive an optimal subset of features.Finally,kernel extreme learning machine(KELM)classifier is used for streaming data classification.The design of LOCI based outlier detection and RDA based FS shows the novelty of the work.In order to assess the classification outcomes of the ODFST-SDC technique,a series of simulations were performed using three benchmark datasets.The experimental results reported the promising outcomes of the ODFST-SDC technique over the recent approaches.
文摘Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempting to maintain discriminative features in processed data.The existing scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.Recently ensemble methods have made a mark in classification tasks as combine multiple results into a single representation.When comparing to a single model,this technique offers for improved prediction.Ensemble based feature selections parallel multiple expert’s judgments on a single topic.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.Further,individual outputs produced by methods producing subsets of features or rankings or voting are also combined in this work.KNN(K-Nearest Neighbor)classifier is used to classify the big dataset obtained from the ensemble learning approach.The results found of the study have been good,proving the proposed model’s efficiency in classifications in terms of the performance metrics like precision,recall,F-measure and accuracy used.
基金This research was supported by a Researchers Supporting Project Number(RSP2021/309)King Saud University,Riyadh,Saudi Arabia.The authors wish to acknowledge Yayasan Universiti Teknologi Petronas for supporting this work through the research grant(015LC0-308).
文摘Machine learning(ML)practices such as classification have played a very important role in classifying diseases in medical science.Since medical science is a sensitive field,the pre-processing of medical data requires careful handling to make quality clinical decisions.Generally,medical data is considered high-dimensional and complex data that contains many irrelevant and redundant features.These factors indirectly upset the disease prediction and classification accuracy of any ML model.To address this issue,various data pre-processing methods called Feature Selection(FS)techniques have been presented in the literature.However,the majority of such techniques frequently suffer from local minima issues due to large solution space.Thus,this study has proposed a novel wrapper-based Sand Cat SwarmOptimization(SCSO)technique as an FS approach to find optimum features from ten benchmark medical datasets.The SCSO algorithm replicates the hunting and searching strategies of the sand cat while having the advantage of avoiding local optima and finding the ideal solution with minimal control variables.Moreover,K-Nearest Neighbor(KNN)classifier was used to evaluate the effectiveness of the features identified by the proposed SCSO algorithm.The performance of the proposed SCSO algorithm was compared with six state-of-the-art and recent wrapper-based optimization algorithms using the validation metrics of classification accuracy,optimum feature size,and computational cost in seconds.The simulation results on the benchmark medical datasets revealed that the proposed SCSO-KNN approach has outperformed comparative algorithms with an average classification accuracy of 93.96%by selecting 14.2 features within 1.91 s.Additionally,the Wilcoxon rank test was used to perform the significance analysis between the proposed SCSOKNN method and six other algorithms for a p-value less than 5.00E-02.The findings revealed that the proposed algorithm produces better outcomes with an average p-value of 1.82E-02.Moreover,potential future directions are also suggested as a result of the study’s promising findings.
文摘Financial crisis prediction(FCP)received significant attention in the financial sector for decision-making.Proper forecasting of the number of firms possible to fail is important to determine the growth index and strength of a nation’s economy.Conventionally,numerous approaches have been developed in the design of accurate FCP processes.At the same time,classifier efficacy and predictive accuracy are inadequate for real-time applications.In addition,several established techniques carry out well to any of the specific datasets but are not adjustable to distinct datasets.Thus,there is a necessity for developing an effectual prediction technique for optimum classifier performance and adjustable to various datasets.This paper presents a novel multi-vs.optimization(MVO)based feature selection(FS)with an optimal variational auto encoder(OVAE)model for FCP.The proposed multi-vs.optimization based feature selection with optimal variational auto encoder(MVOFS-OVAE)model mainly aims to accomplish forecasting the financial crisis.For achieving this,the proposed MVOFS-OVAE model primarily pre-processes the financial data using min-max normalization.In addition,the MVOFS-OVAE model designs a feature subset selection process using the MVOFS approach.Followed by,the variational auto encoder(VAE)model is applied for the categorization of financial data into financial crisis or non-financial crisis.Finally,the differential evolution(DE)algorithm is utilized for the parameter tuning of the VAE model.A series of simulations on the benchmark dataset reported the betterment of the MVOFS-OVAE approach over the recent state of art approaches.
基金funded by the Prince Sultan University,Riyadh,Saudi Arabia.
文摘This paper proposes Parallelized Linear Time-Variant Acceleration Coefficients and Inertial Weight of Particle Swarm Optimization algorithm(PLTVACIW-PSO).Its designed has introduced the benefits of Parallel computing into the combined power of TVAC(Time-Variant Acceleration Coefficients)and IW(Inertial Weight).Proposed algorithm has been tested against linear,non-linear,traditional,andmultiswarmbased optimization algorithms.An experimental study is performed in two stages to assess the proposed PLTVACIW-PSO.Phase I uses 12 recognized Standard Benchmarks methods to evaluate the comparative performance of the proposed PLTVACIWPSO vs.IW based Particle Swarm Optimization(PSO)algorithms,TVAC based PSO algorithms,traditional PSO,Genetic algorithms(GA),Differential evolution(DE),and,finally,Flower Pollination(FP)algorithms.In phase II,the proposed PLTVACIW-PSO uses the same 12 known Benchmark functions to test its performance against the BAT(BA)and Multi-Swarm BAT algorithms.In phase III,the proposed PLTVACIW-PSO is employed to augment the feature selection problem formedical datasets.This experimental study shows that the planned PLTVACIW-PSO outpaces the performances of other comparable algorithms.Outcomes from the experiments shows that the PLTVACIW-PSO is capable of outlining a feature subset that is capable of enhancing the classification efficiency and gives the minimal subset of the core features.
基金Supported by the National Basic Research Program of China(No2011CB505106)the National Natural Science Foundation of China(No30902020)+2 种基金the Foundation of National Department of Public Benefit Research of China(No200807007)the Creation Fund for Significant New Drugs of China(No2009ZX09502-018)the Foundation of International Science and Technology Cooperation of China(No2008DFA30610)
文摘Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metabolomics is a better avenue to understand the inner mechanism of it. Feature selection based data mining method is better suited to identify biomarkers of UA. In this study, we carried out clinical epidemiology to collect plasmas of UA in-patients and controls. Proteomics and metabolomics data were obtained via two-dimensional difference gel electrophoresis and gas chromatography techniques. We presented a novel computational strategy to select biomarkers as few as possible for UA in the two groups of data. Firstly, decision tree was used to select biomarkers for UA and 3-fold cross validation was used to evaluate computational performanees for the three methods. Alternatively, we combined inde- pendent t test and classification based data mining method as well as backward elimination technique to select, as few as possible, protein and metabolite biomarkers with best classification performances. By the method, we selected 6 proteins and 5 metabolites for UA. The novel method presented here provides a better insight into the pathology of a disease.
文摘This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.
基金supported in part by the National Science Foundation of USA(CMMI-1162482)
文摘Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
文摘In the area of pattern recognition and machine learning,features play a key role in prediction.The famous applications of features are medical imaging,image classification,and name a few more.With the exponential growth of information investments in medical data repositories and health service provision,medical institutions are collecting large volumes of data.These data repositories contain details information essential to support medical diagnostic decisions and also improve patient care quality.On the other hand,this growth also made it difficult to comprehend and utilize data for various purposes.The results of imaging data can become biased because of extraneous features present in larger datasets.Feature selection gives a chance to decrease the number of components in such large datasets.Through selection techniques,ousting the unimportant features and selecting a subset of components that produces prevalent characterization precision.The correct decision to find a good attribute produces a precise grouping model,which enhances learning pace and forecast control.This paper presents a review of feature selection techniques and attributes selection measures for medical imaging.This review is meant to describe feature selection techniques in a medical domainwith their pros and cons and to signify its application in imaging data and data mining algorithms.The review reveals the shortcomings of the existing feature and attributes selection techniques to multi-sourced data.Moreover,this review provides the importance of feature selection for correct classification of medical infections.In the end,critical analysis and future directions are provided.
基金supported by the National Natural Science Foundation of China(42271360 and 42271399)the Young Elite Scientists Sponsorship Program by China Association for Science and Technology(CAST)(2020QNRC001)the Fundamental Research Funds for the Central Universities,China(2662021JC013,CCNU22QN018)。
文摘Ratoon rice,which refers to a second harvest of rice obtained from the regenerated tillers originating from the stubble of the first harvested crop,plays an important role in both food security and agroecology while requiring minimal agricultural inputs.However,accurately identifying ratoon rice crops is challenging due to the similarity of its spectral features with other rice cropping systems(e.g.,double rice).Moreover,images with a high spatiotemporal resolution are essential since ratoon rice is generally cultivated in fragmented croplands within regions that frequently exhibit cloudy and rainy weather.In this study,taking Qichun County in Hubei Province,China as an example,we developed a new phenology-based ratoon rice vegetation index(PRVI)for the purpose of ratoon rice mapping at a 30 m spatial resolution using a robust time series generated from Harmonized Landsat and Sentinel-2(HLS)images.The PRVI that incorporated the red,near-infrared,and shortwave infrared 1 bands was developed based on the analysis of spectro-phenological separability and feature selection.Based on actual field samples,the performance of the PRVI for ratoon rice mapping was carefully evaluated by comparing it to several vegetation indices,including normalized difference vegetation index(NDVI),enhanced vegetation index(EVI)and land surface water index(LSWI).The results suggested that the PRVI could sufficiently capture the specific characteristics of ratoon rice,leading to a favorable separability between ratoon rice and other land cover types.Furthermore,the PRVI showed the best performance for identifying ratoon rice in the phenological phases characterized by grain filling and harvesting to tillering of the ratoon crop(GHS-TS2),indicating that only several images are required to obtain an accurate ratoon rice map.Finally,the PRVI performed better than NDVI,EVI,LSWI and their combination at the GHS-TS2 stages,with producer's accuracy and user's accuracy of 92.22 and 89.30%,respectively.These results demonstrate that the proposed PRVI based on HLS data can effectively identify ratoon rice in fragmented croplands at crucial phenological stages,which is promising for identifying the earliest timing of ratoon rice planting and can provide a fundamental dataset for crop management activities.
文摘Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.
基金supported in part by the National Natural Science Foundation of China(62172065,62072060)。
文摘As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
基金supported by the National High Technology Research and Development Program of China(863 Program)(No.2012AA040608)National Natural Science Foundation of China(Nos.61473279,61004131)the Development of Scientific Research Equipment Program of Chinese Academy of Sciences(No.YZ201247)
文摘Principal component analysis (PCA) combined with artificial neural networks was used to classify the spectra of 27 steel samples acquired using laser-induced breakdown spectroscopy. Three methods of spectral data selection, selecting all the peak lines of the spectra, selecting intensive spectral partitions and the whole spectra, were utilized to compare the infiuence of different inputs of PCA on the classification of steels. Three intensive partitions were selected based on experience and prior knowledge to compare the classification, as the partitions can obtain the best results compared to all peak lines and the whole spectra. We also used two test data sets, mean spectra after being averaged and raw spectra without any pretreatment, to verify the results of the classification. The results of this comprehensive comparison show that a back propagation network trained using the principal components of appropriate, carefully selecred spectral partitions can obtain the best results accuracy can be achieved using the intensive spectral A perfect result with 100% classification partitions ranging of 357-367 nm.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number(45/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R140)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4310373DSR16).
文摘Due to exponential increase in smart resource limited devices and high speed communication technologies,Internet of Things(IoT)have received significant attention in different application areas.However,IoT environment is highly susceptible to cyber-attacks because of memory,processing,and communication restrictions.Since traditional models are not adequate for accomplishing security in the IoT environment,the recent developments of deep learning(DL)models find beneficial.This study introduces novel hybrid metaheuristics feature selection with stacked deep learning enabled cyber-attack detection(HMFS-SDLCAD)model.The major intention of the HMFS-SDLCAD model is to recognize the occurrence of cyberattacks in the IoT environment.At the preliminary stage,data pre-processing is carried out to transform the input data into useful format.In addition,salp swarm optimization based on particle swarm optimization(SSOPSO)algorithm is used for feature selection process.Besides,stacked bidirectional gated recurrent unit(SBiGRU)model is utilized for the identification and classification of cyberattacks.Finally,whale optimization algorithm(WOA)is employed for optimal hyperparameter optimization process.The experimental analysis of the HMFS-SDLCAD model is validated using benchmark dataset and the results are assessed under several aspects.The simulation outcomes pointed out the improvements of the HMFS-SDLCAD model over recent approaches.