Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation...Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
As the risks associated with air turbulence are intensified by climate change and the growth of the aviation industry,it has become imperative to monitor and mitigate these threats to ensure civil aviation safety.The ...As the risks associated with air turbulence are intensified by climate change and the growth of the aviation industry,it has become imperative to monitor and mitigate these threats to ensure civil aviation safety.The eddy dissipation rate(EDR)has been established as the standard metric for quantifying turbulence in civil aviation.This study aims to explore a universally applicable symbolic classification approach based on genetic programming to detect turbulence anomalies using quick access recorder(QAR)data.The detection of atmospheric turbulence is approached as an anomaly detection problem.Comparative evaluations demonstrate that this approach performs on par with direct EDR calculation methods in identifying turbulence events.Moreover,comparisons with alternative machine learning techniques indicate that the proposed technique is the optimal methodology currently available.In summary,the use of symbolic classification via genetic programming enables accurate turbulence detection from QAR data,comparable to that with established EDR approaches and surpassing that achieved with machine learning algorithms.This finding highlights the potential of integrating symbolic classifiers into turbulence monitoring systems to enhance civil aviation safety amidst rising environmental and operational hazards.展开更多
Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints...Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.展开更多
One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approac...One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approaches are used effectively for this purpose.In this study,we present a model in which supervised and unsupervised learning algorithms are used together.Clustering is used to enhance the prediction performance of the supervised classifiers.The aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 score.In the first stage of the model,the data are clustered with the k-means algorithm.In the second stage,the prediction is made with the combination of the classifier with the best prediction performance for the related cluster.While choosing the best classifiers for the given clusters,triple combinations of ten machine learning algorithms(kernel support vector machine,k-nearest neighbor,naive Bayes,decision tree,random forest,extra gradient boosting,categorical boosting,adaptive boosting,extra trees,and gradient boosting)are used.The selected triple classifier combination is positioned in two stages.The prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second stage.The selected triple classifier combination is positioned in two tiers.The prediction time of the model is improved by positioning the classifier with the highest prediction time in the second tier.It is seen that clustering before classification improves prediction performance,which is presented using Blue Hexagon Open Dataset for Malware Analysis(BODMAS),Elastic Malware Benchmark for Empowering Researchers(EMBER)2018 and Kaggle malware detection datasets.The model has 99.74%accuracy and 99.77%f1 score for the BODMAS dataset,99.04%accuracy and 98.63%f1 score for the Kaggle malware detection dataset,and 96.77%accuracy and 96.77%f1 score for the EMBER 2018 dataset.In addition,the tiered positioning of classifiers shortened the average prediction time by 76.13%for the BODMAS dataset and 95.95%for the EMBER 2018 dataset.The proposed method’s prediction performance is better than the rest of the studies in the literature in which BODMAS and EMBER 2018 datasets are used.展开更多
The key objective of intrusion detection systems(IDS)is to protect the particular host or network by investigating and predicting the network traffic as an attack or normal.These IDS uses many methods of machine learn...The key objective of intrusion detection systems(IDS)is to protect the particular host or network by investigating and predicting the network traffic as an attack or normal.These IDS uses many methods of machine learning(ML)to learn from pastexperience attack i.e.signatures based and identify the new ones.Even though these methods are effective,but they have to suffer from large computational costs due to considering all the traffic features,together.Moreover,emerging technologies like the Internet of Things(Io T),big data,etc.are getting advanced day by day;as a result,network traffics are also increasing rapidly.Therefore,the issue of computational cost needs to be addressed properly.Thus,in this research,firstly,the ML methods have been used with the feature selection technique(FST)to reduce the number of features by picking out only the important ones from NSL-KDD,CICIDS2017,and CIC-DDo S2019datasets later that helped to build IDSs with lower cost but with the higher performance which would be appropriate for vast scale network.The experimental result demonstrated that the proposed model i.e.Decision tree(DT)with Recursive feature elimination(RFE)performs better than other classifiers with RFE in terms of accuracy,specificity,precision,sensitivity,F1-score,and G-means on the investigated datasets.展开更多
The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed wo...The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed work intends to implement a new security framework for detecting the most specific and harmful intrusions in IoT networks.In this framework,a Covariance Linear Learning Embedding Selection(CL2ES)methodology is used at first to extract the features highly associated with the IoT intrusions.Then,the Kernel Distributed Bayes Classifier(KDBC)is created to forecast attacks based on the probability distribution value precisely.In addition,a unique Mongolian Gazellas Optimization(MGO)algorithm is used to optimize the weight value for the learning of the classifier.The effectiveness of the proposed CL2ES-KDBC framework has been assessed using several IoT cyber-attack datasets,The obtained results are then compared with current classification methods regarding accuracy(97%),precision(96.5%),and other factors.Computational analysis of the CL2ES-KDBC system on IoT intrusion datasets is performed,which provides valuable insight into its performance,efficiency,and suitability for securing IoT networks.展开更多
The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orient...The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.展开更多
Breast cancer is a deadly disease and radiologists recommend mammography to detect it at the early stages. This paper presents two types of HanmanNets using the information set concept for the derivation of deep infor...Breast cancer is a deadly disease and radiologists recommend mammography to detect it at the early stages. This paper presents two types of HanmanNets using the information set concept for the derivation of deep information set features from ResNet by modifying its kernel functions to yield Type-1 HanmanNets and then AlexNet, GoogLeNet and VGG-16 by changing their feature maps to yield Type-2 HanmanNets. The two types of HanmanNets exploit the final feature maps of these architectures in the generation of deep information set features from mammograms for their classification using the Hanman Transform Classifier. In this work, the characteristics of the abnormality present in the mammograms are captured using the above network architectures that help derive the features of HanmanNets based on information set concept and their performance is compared via the classification accuracies. The highest accuracy of 100% is achieved for the multi-class classifications on the mini-MIAS database thus surpassing the results in the literature. Validation of the results is done by the expert radiologists to show their clinical relevance.展开更多
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air class...The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air classifier's structural design. The flow field characteristics of the rotor cage in turbo air classifiers were investigated trader different operating conditions by laser Doppler velocimeter(LDV), and a measure diminishing the axial velocity is proposed. The investigation results show that the tangential velocity of the air flow inside the rotor cage is different from the rotary speed of the rotor cage on the same measurement point due to the influences of both the negative pressure at the exit and the rotation of the rotor cage. The tangential velocity of the air flow likewise decreases as the radius decreases in the case of the rotor cage's low rotary speed. In contrast, the tangential velocity of the air flow increases as the radius decreases in the case of the rotor cage's high rotary speed. Meanwhile, the vortex inside the rotor cage is found to occur near the pressure side of the blade when the rotor cage's rotary speed is less than the tangential velocity of air flow. On the contrary, the vortex is found to occur near the blade suction side once the rotor cage's rotary speed is higher than the tangential velocity of air flow. Inside the rotor cage, the axial velocity could not be disregarded and is largely determined by the distances between the measurement point and the exit.展开更多
Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile...Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile often accompa-nied by thin trading-volumes and they are susceptible to more manipulation compared to mature markets. Technical analysis of stocks and commodities has become a science on its own;quantitative methods and techniques have been applied by many practitioners to forecast price movements. Lagging and sometimes leading technical indicators pro-vide rich quantitative tools for traders and investors in their attempt to gain advantage when making investment or trading decisions. Artificial Neural Networks (ANN) have been used widely in predicting stock prices because of their capability in capturing the non-linearity that often exists in price movements. Recently, Polynomial Classifiers (PC) have been applied to various recognition and classification application and showed favorable results in terms of recog-nition rates and computational complexity as compared to ANN. In this paper, we present two prediction models for predicting securities’ prices. The first model was developed using back propagation feed forward neural networks. The second model was developed using polynomial classifiers (PC), as a first time application for PC to be used in stock prices prediction. The inputs to both models were identical, and both models were trained and tested on the same data. The study was conducted on Dubai Financial Market as an emerging market and applied to two of the market’s leading stocks. In general, both models achieved very good results in terms of mean absolute error percentage. Both models show an average error around 1.5% predicting the next day price, an average error of 2.5% when predicting second day price, and an average error of 4% when predicted the third day price.展开更多
This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, wh...This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, who were randomly assigned to two experimental groups and one control group, with each group consisting of 32 participants. The continuation task used in this study consisted of a picture-based Chinese text depicting a room with an array of objects, which necessitates the use of classifiers. The two experimental groups were both required to first read the text and then write to describe their own rooms in comparison with the one in the text. One group was instructed to use the classifiers from the text as much as possible in their writing, whereas the other was not required to do so. Participants in the control group were first given the picture to look at in the absence of the text and then asked to describe their own rooms. The results showed that the continuation task significantly enhanced participants’ retention of the Chinese numeral classifiers, suggesting that the alignment-based approach is an effective way to learn difficult linguistic categories such as the Chinese classifiers.展开更多
The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class cla...The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.展开更多
The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measur...The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measured using the particle image velocimetry technique.The results showed that the flow field adjacent to two neighboring blades with the swirling inlet was significantly different from that with the non-swirling inlet.With the swirling inlet,there was a vortex located between two neighboring blades,while with the nonswirling inlet,the vortex was attached to the blade tip.The vorticity of the vortex with the non-swirling inlet was much lower than that with the swirling inlet.The classifier with the non-swirling inlet demonstrated a larger cut size than that with the swirling inlet when the impeller was stationary(~0 r·min-1).As the impeller rotational speed increased,the cut size of the cases with non-swirling and swirling inlets both decreased,and the one with the non-swirling inlet decreased more dramatically.The values of the cut size of the two classifiers were close to each other at a high impeller rotational speed(≥120 r·min-1).The overall separation efficiency of the classifier with the non-swirling inlet was lower than that with the swirling inlet,and monotonically increased as the impeller rotational speed increased.With the swirling inlet,the overall separation efficiency first increased with the impeller rotational speed and then decreased when the rotational speed was above 120 r·min-1,and the variation trend of the separation efficiency was more moderate.As the initial particle concentration increased,the cut sizes of both swirling and non-swirling inlet cases decreased first and then barely changed.At a low initial particle concentration(b 0.04 kg·m-3),the classifier with the swirling inlet had a larger cut size than that with the non-swirling inlet.展开更多
Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise...Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise the multi-class classification accuracies. The topology structure of DAG is constructed by rearranging the nodes' sequence in the graph. DAG is equivalent to guided operating SVM on a list, and the classification performance depends on the nodes' sequence in the graph. Jeffries-Matusita distance (JMD) is introduced to estimate the separability of each class, and the implementation list is initialized with all classes organized according to certain sequence in the list. To testify the effectiveness of the proposed method, numerical analysis is conducted on UCI data and hyperspectral data. Meanwhile, comparative studies using standard OAO and DAG classification methods are also conducted and the results illustrate better performance and higher accuracy of the orooosed JMD-DAG method.展开更多
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the ...The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.展开更多
文摘Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.
基金supported by the Meteorological Soft Science Project(Grant No.2023ZZXM29)the Natural Science Fund Project of Tianjin,China(Grant No.21JCYBJC00740)the Key Research and Development-Social Development Program of Jiangsu Province,China(Grant No.BE2021685).
文摘As the risks associated with air turbulence are intensified by climate change and the growth of the aviation industry,it has become imperative to monitor and mitigate these threats to ensure civil aviation safety.The eddy dissipation rate(EDR)has been established as the standard metric for quantifying turbulence in civil aviation.This study aims to explore a universally applicable symbolic classification approach based on genetic programming to detect turbulence anomalies using quick access recorder(QAR)data.The detection of atmospheric turbulence is approached as an anomaly detection problem.Comparative evaluations demonstrate that this approach performs on par with direct EDR calculation methods in identifying turbulence events.Moreover,comparisons with alternative machine learning techniques indicate that the proposed technique is the optimal methodology currently available.In summary,the use of symbolic classification via genetic programming enables accurate turbulence detection from QAR data,comparable to that with established EDR approaches and surpassing that achieved with machine learning algorithms.This finding highlights the potential of integrating symbolic classifiers into turbulence monitoring systems to enhance civil aviation safety amidst rising environmental and operational hazards.
文摘Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.
文摘One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approaches are used effectively for this purpose.In this study,we present a model in which supervised and unsupervised learning algorithms are used together.Clustering is used to enhance the prediction performance of the supervised classifiers.The aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 score.In the first stage of the model,the data are clustered with the k-means algorithm.In the second stage,the prediction is made with the combination of the classifier with the best prediction performance for the related cluster.While choosing the best classifiers for the given clusters,triple combinations of ten machine learning algorithms(kernel support vector machine,k-nearest neighbor,naive Bayes,decision tree,random forest,extra gradient boosting,categorical boosting,adaptive boosting,extra trees,and gradient boosting)are used.The selected triple classifier combination is positioned in two stages.The prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second stage.The selected triple classifier combination is positioned in two tiers.The prediction time of the model is improved by positioning the classifier with the highest prediction time in the second tier.It is seen that clustering before classification improves prediction performance,which is presented using Blue Hexagon Open Dataset for Malware Analysis(BODMAS),Elastic Malware Benchmark for Empowering Researchers(EMBER)2018 and Kaggle malware detection datasets.The model has 99.74%accuracy and 99.77%f1 score for the BODMAS dataset,99.04%accuracy and 98.63%f1 score for the Kaggle malware detection dataset,and 96.77%accuracy and 96.77%f1 score for the EMBER 2018 dataset.In addition,the tiered positioning of classifiers shortened the average prediction time by 76.13%for the BODMAS dataset and 95.95%for the EMBER 2018 dataset.The proposed method’s prediction performance is better than the rest of the studies in the literature in which BODMAS and EMBER 2018 datasets are used.
文摘The key objective of intrusion detection systems(IDS)is to protect the particular host or network by investigating and predicting the network traffic as an attack or normal.These IDS uses many methods of machine learning(ML)to learn from pastexperience attack i.e.signatures based and identify the new ones.Even though these methods are effective,but they have to suffer from large computational costs due to considering all the traffic features,together.Moreover,emerging technologies like the Internet of Things(Io T),big data,etc.are getting advanced day by day;as a result,network traffics are also increasing rapidly.Therefore,the issue of computational cost needs to be addressed properly.Thus,in this research,firstly,the ML methods have been used with the feature selection technique(FST)to reduce the number of features by picking out only the important ones from NSL-KDD,CICIDS2017,and CIC-DDo S2019datasets later that helped to build IDSs with lower cost but with the higher performance which would be appropriate for vast scale network.The experimental result demonstrated that the proposed model i.e.Decision tree(DT)with Recursive feature elimination(RFE)performs better than other classifiers with RFE in terms of accuracy,specificity,precision,sensitivity,F1-score,and G-means on the investigated datasets.
文摘The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed work intends to implement a new security framework for detecting the most specific and harmful intrusions in IoT networks.In this framework,a Covariance Linear Learning Embedding Selection(CL2ES)methodology is used at first to extract the features highly associated with the IoT intrusions.Then,the Kernel Distributed Bayes Classifier(KDBC)is created to forecast attacks based on the probability distribution value precisely.In addition,a unique Mongolian Gazellas Optimization(MGO)algorithm is used to optimize the weight value for the learning of the classifier.The effectiveness of the proposed CL2ES-KDBC framework has been assessed using several IoT cyber-attack datasets,The obtained results are then compared with current classification methods regarding accuracy(97%),precision(96.5%),and other factors.Computational analysis of the CL2ES-KDBC system on IoT intrusion datasets is performed,which provides valuable insight into its performance,efficiency,and suitability for securing IoT networks.
文摘The number of blogs and other forms of opinionated online content has increased dramatically in recent years.Many fields,including academia and national security,place an emphasis on automated political article orientation detection.Political articles(especially in the Arab world)are different from other articles due to their subjectivity,in which the author’s beliefs and political affiliation might have a significant influence on a political article.With categories representing the main political ideologies,this problem may be thought of as a subset of the text categorization(classification).In general,the performance of machine learning models for text classification is sensitive to hyperparameter settings.Furthermore,the feature vector used to represent a document must capture,to some extent,the complex semantics of natural language.To this end,this paper presents an intelligent system to detect political Arabic article orientation that adapts the categorical boosting(CatBoost)method combined with a multi-level feature concept.Extracting features at multiple levels can enhance the model’s ability to discriminate between different classes or patterns.Each level may capture different aspects of the input data,contributing to a more comprehensive representation.CatBoost,a robust and efficient gradient-boosting algorithm,is utilized to effectively learn and predict the complex relationships between these features and the political orientation labels associated with the articles.A dataset of political Arabic texts collected from diverse sources,including postings and articles,is used to assess the suggested technique.Conservative,reform,and revolutionary are the three subcategories of these opinions.The results of this study demonstrate that compared to other frequently used machine learning models for text classification,the CatBoost method using multi-level features performs better with an accuracy of 98.14%.
文摘Breast cancer is a deadly disease and radiologists recommend mammography to detect it at the early stages. This paper presents two types of HanmanNets using the information set concept for the derivation of deep information set features from ResNet by modifying its kernel functions to yield Type-1 HanmanNets and then AlexNet, GoogLeNet and VGG-16 by changing their feature maps to yield Type-2 HanmanNets. The two types of HanmanNets exploit the final feature maps of these architectures in the generation of deep information set features from mammograms for their classification using the Hanman Transform Classifier. In this work, the characteristics of the abnormality present in the mammograms are captured using the above network architectures that help derive the features of HanmanNets based on information set concept and their performance is compared via the classification accuracies. The highest accuracy of 100% is achieved for the multi-class classifications on the mini-MIAS database thus surpassing the results in the literature. Validation of the results is done by the expert radiologists to show their clinical relevance.
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
基金supported by National Natural Science Foundation of China (Grant No. 50474035)
文摘The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air classifier's structural design. The flow field characteristics of the rotor cage in turbo air classifiers were investigated trader different operating conditions by laser Doppler velocimeter(LDV), and a measure diminishing the axial velocity is proposed. The investigation results show that the tangential velocity of the air flow inside the rotor cage is different from the rotary speed of the rotor cage on the same measurement point due to the influences of both the negative pressure at the exit and the rotation of the rotor cage. The tangential velocity of the air flow likewise decreases as the radius decreases in the case of the rotor cage's low rotary speed. In contrast, the tangential velocity of the air flow increases as the radius decreases in the case of the rotor cage's high rotary speed. Meanwhile, the vortex inside the rotor cage is found to occur near the pressure side of the blade when the rotor cage's rotary speed is less than the tangential velocity of air flow. On the contrary, the vortex is found to occur near the blade suction side once the rotor cage's rotary speed is higher than the tangential velocity of air flow. Inside the rotor cage, the axial velocity could not be disregarded and is largely determined by the distances between the measurement point and the exit.
文摘Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile often accompa-nied by thin trading-volumes and they are susceptible to more manipulation compared to mature markets. Technical analysis of stocks and commodities has become a science on its own;quantitative methods and techniques have been applied by many practitioners to forecast price movements. Lagging and sometimes leading technical indicators pro-vide rich quantitative tools for traders and investors in their attempt to gain advantage when making investment or trading decisions. Artificial Neural Networks (ANN) have been used widely in predicting stock prices because of their capability in capturing the non-linearity that often exists in price movements. Recently, Polynomial Classifiers (PC) have been applied to various recognition and classification application and showed favorable results in terms of recog-nition rates and computational complexity as compared to ANN. In this paper, we present two prediction models for predicting securities’ prices. The first model was developed using back propagation feed forward neural networks. The second model was developed using polynomial classifiers (PC), as a first time application for PC to be used in stock prices prediction. The inputs to both models were identical, and both models were trained and tested on the same data. The study was conducted on Dubai Financial Market as an emerging market and applied to two of the market’s leading stocks. In general, both models achieved very good results in terms of mean absolute error percentage. Both models show an average error around 1.5% predicting the next day price, an average error of 2.5% when predicting second day price, and an average error of 4% when predicted the third day price.
文摘This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, who were randomly assigned to two experimental groups and one control group, with each group consisting of 32 participants. The continuation task used in this study consisted of a picture-based Chinese text depicting a room with an array of objects, which necessitates the use of classifiers. The two experimental groups were both required to first read the text and then write to describe their own rooms in comparison with the one in the text. One group was instructed to use the classifiers from the text as much as possible in their writing, whereas the other was not required to do so. Participants in the control group were first given the picture to look at in the absence of the text and then asked to describe their own rooms. The results showed that the continuation task significantly enhanced participants’ retention of the Chinese numeral classifiers, suggesting that the alignment-based approach is an effective way to learn difficult linguistic categories such as the Chinese classifiers.
基金supported by the National Natural Science Foundation of China(61703131 61703129+1 种基金 61701148 61703128)
文摘The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.
基金financial support from the National Key Technologies R&D Program of China(2018YFF0216002)。
文摘The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measured using the particle image velocimetry technique.The results showed that the flow field adjacent to two neighboring blades with the swirling inlet was significantly different from that with the non-swirling inlet.With the swirling inlet,there was a vortex located between two neighboring blades,while with the nonswirling inlet,the vortex was attached to the blade tip.The vorticity of the vortex with the non-swirling inlet was much lower than that with the swirling inlet.The classifier with the non-swirling inlet demonstrated a larger cut size than that with the swirling inlet when the impeller was stationary(~0 r·min-1).As the impeller rotational speed increased,the cut size of the cases with non-swirling and swirling inlets both decreased,and the one with the non-swirling inlet decreased more dramatically.The values of the cut size of the two classifiers were close to each other at a high impeller rotational speed(≥120 r·min-1).The overall separation efficiency of the classifier with the non-swirling inlet was lower than that with the swirling inlet,and monotonically increased as the impeller rotational speed increased.With the swirling inlet,the overall separation efficiency first increased with the impeller rotational speed and then decreased when the rotational speed was above 120 r·min-1,and the variation trend of the separation efficiency was more moderate.As the initial particle concentration increased,the cut sizes of both swirling and non-swirling inlet cases decreased first and then barely changed.At a low initial particle concentration(b 0.04 kg·m-3),the classifier with the swirling inlet had a larger cut size than that with the non-swirling inlet.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61201310)the Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201160)the China Postdoctoral Science Foundation(Grant No.20110491067)
文摘Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise the multi-class classification accuracies. The topology structure of DAG is constructed by rearranging the nodes' sequence in the graph. DAG is equivalent to guided operating SVM on a list, and the classification performance depends on the nodes' sequence in the graph. Jeffries-Matusita distance (JMD) is introduced to estimate the separability of each class, and the implementation list is initialized with all classes organized according to certain sequence in the list. To testify the effectiveness of the proposed method, numerical analysis is conducted on UCI data and hyperspectral data. Meanwhile, comparative studies using standard OAO and DAG classification methods are also conducted and the results illustrate better performance and higher accuracy of the orooosed JMD-DAG method.
基金Acknowledgements This paper was supported by the coUabomtive Research Project SEV under Cant No. 01100474 between Beijing University of Posts and Telecorrrcnications and France Telecom R&D Beijing the National Natural Science Foundation of China under Cant No. 90920001 the Caduate Innovation Fund of SICE, BUPT, 2011.
文摘The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.