Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with ...Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified...Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.展开更多
Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The pr...Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.展开更多
The interaction between humans and machines has become an issue of concern in recent years.Besides facial expressions or gestures,speech has been evidenced as one of the foremost promising modalities for automatic emo...The interaction between humans and machines has become an issue of concern in recent years.Besides facial expressions or gestures,speech has been evidenced as one of the foremost promising modalities for automatic emotion recognition.Effective computing means to support HCI(Human-Computer Interaction)at a psychological level,allowing PCs to adjust their reactions as per human requirements.Therefore,the recognition of emotion is pivotal in High-level interactions.Each Emotion has distinctive properties that form us to recognize them.The acoustic signal produced for identical expression or sentence changes is essentially a direct result of biophysical changes,(for example,the stress instigated narrowing of the larynx)set off by emotions.This connection between acoustic cues and emotions made Speech Emotion Recognition one of the moving subjects of the emotive computing area.The most motivation behind a Speech Emotion Recognition algorithm is to observe the emotional condition of a speaker from recorded Speech signals.The results from the application of k-NN and OVA-SVM for MFCC features without and with a feature selection approach are presented in this research.The MFCC features from the audio signal were initially extracted to characterize the properties of emotional speech.Secondly,nine basic statistical measures were calculated from MFCC and 117-dimensional features were consequently obtained to train the classifiers for seven different classes(Anger,Happiness,Disgust,Fear,Sadness,Disgust,Boredom and Neutral)of emotions.Next,Classification was done in four steps.First,all the 117-features are classified using both classifiers.Second,the best classifier was found and then features were scaled to[-1,1]and classified.In the third step,the with or without feature scaling which gives better performance was derived from the results of the second step and the classification was done for each of the basic statistical measures separately.Finally,in the fourth step,the combination of statistical measures which gives better performance was derived using the forward feature selection method Experiments were carried out using k-NN with different k values and a linear OVA-based SVM classifier with different optimal values.Berlin emotional speech database for the German language was utilized for testing the planned methodology and recognition rates as high as 60%accomplished for the recognition of emotion from voice signal for the set of statistical measures(median,maximum,mean,Inter-quartile range,skewness).OVA-SVM performs better than k-NN and the use of the feature selection technique gives a high rate.展开更多
The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for ...Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.展开更多
For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be...For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be increased. In this paper, we use fuzzy c-means (FCM) clustering to take out some samples that are useless, and extract the intersection between the original training set and the cluster after using FCM clustering. The intersection between every class and cluster is reliable samples which we are looking for. The experiment result demonstrates that the superiority of the proposed algorithm is remarkable.展开更多
In this paper, a new shape classification system based on singular value decomposition (SVD) transform using nearest neighbour classifier was proposed. The gray scale image of the shape object was converted into a bla...In this paper, a new shape classification system based on singular value decomposition (SVD) transform using nearest neighbour classifier was proposed. The gray scale image of the shape object was converted into a black and white image. The squared Euclidean distance transform on binary image was applied to extract the boundary image of the shape. SVD transform features were extracted from the the boundary of the object shapes. In this paper, the proposed classification system based on SVD transform feature extraction method was compared with classifier based on moment invariants using nearest neighbour classifier. The experimental results showed the advantage of our proposed classification system.展开更多
Marginal Fisher analysis (MFA) is a repre- sentative margin-based learning algorithm for face recognition. A major problem in MFA is how to select appropriate parameters, k1 and k2, to construct the respective intri...Marginal Fisher analysis (MFA) is a repre- sentative margin-based learning algorithm for face recognition. A major problem in MFA is how to select appropriate parameters, k1 and k2, to construct the respective intrinsic and penalty graphs. In this paper, we propose a novel method called nearest-neighbor (NN) classifier motivated marginal discriminant projections (NN-MDP). Motivated by the NN classifier, NN-MDP seeks a few projection vectors to prevent data samples from being wrongly categorized. Like MFA, NN-MDP can characterize the compactness and separability of samples simultaneously. Moreover, in contrast to MFA, NN-MDP can actively construct the intrinsic graph and penalty graph without unknown parameters. Experimental results on the 0RL, Yale, and FERET face databases show that NN-MDP not only avoids the intractability, and high expense of neighborhood parameter selection, but is also more applicable to face recognition with NN classifier than other methods.展开更多
Confidence value plays a vital role in the decision of rejection threshold and the integration of multiple classifiers. Nearest neighbor (NN) classifier is the most traditional and most common nonparameter statistical...Confidence value plays a vital role in the decision of rejection threshold and the integration of multiple classifiers. Nearest neighbor (NN) classifier is the most traditional and most common nonparameter statistical pattern classifier. However, so far there is no explicate theoretical analysis of the connection between nearest distance and confidence value. An analytical insight into different approximations is presented and one formula is pointed out that it is optimal in the sense of mathematical expectation. Practice in handwritten numeral recognition strongly supports the conclusion.展开更多
基金This project was supported by Shanghai Shu Guang Project.
文摘Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.
文摘Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.
文摘Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.
文摘The interaction between humans and machines has become an issue of concern in recent years.Besides facial expressions or gestures,speech has been evidenced as one of the foremost promising modalities for automatic emotion recognition.Effective computing means to support HCI(Human-Computer Interaction)at a psychological level,allowing PCs to adjust their reactions as per human requirements.Therefore,the recognition of emotion is pivotal in High-level interactions.Each Emotion has distinctive properties that form us to recognize them.The acoustic signal produced for identical expression or sentence changes is essentially a direct result of biophysical changes,(for example,the stress instigated narrowing of the larynx)set off by emotions.This connection between acoustic cues and emotions made Speech Emotion Recognition one of the moving subjects of the emotive computing area.The most motivation behind a Speech Emotion Recognition algorithm is to observe the emotional condition of a speaker from recorded Speech signals.The results from the application of k-NN and OVA-SVM for MFCC features without and with a feature selection approach are presented in this research.The MFCC features from the audio signal were initially extracted to characterize the properties of emotional speech.Secondly,nine basic statistical measures were calculated from MFCC and 117-dimensional features were consequently obtained to train the classifiers for seven different classes(Anger,Happiness,Disgust,Fear,Sadness,Disgust,Boredom and Neutral)of emotions.Next,Classification was done in four steps.First,all the 117-features are classified using both classifiers.Second,the best classifier was found and then features were scaled to[-1,1]and classified.In the third step,the with or without feature scaling which gives better performance was derived from the results of the second step and the classification was done for each of the basic statistical measures separately.Finally,in the fourth step,the combination of statistical measures which gives better performance was derived using the forward feature selection method Experiments were carried out using k-NN with different k values and a linear OVA-based SVM classifier with different optimal values.Berlin emotional speech database for the German language was utilized for testing the planned methodology and recognition rates as high as 60%accomplished for the recognition of emotion from voice signal for the set of statistical measures(median,maximum,mean,Inter-quartile range,skewness).OVA-SVM performs better than k-NN and the use of the feature selection technique gives a high rate.
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
基金This work is supported by the KIAS(Research Number:CG076601)and in part by Sejong University Faculty Research Fund.
文摘Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.
基金supported by the National Natural Science Foundation under Grant No.61175055 and No.61105059support of research funds of Sichuan Key Laboratory of Intelligent Network Information Processing under Grant No.SGXZD1002-10Si chuan Key Technology Research and Development Program under Grant No.2012GZ0019 and No.2011FZ0051
文摘For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be increased. In this paper, we use fuzzy c-means (FCM) clustering to take out some samples that are useless, and extract the intersection between the original training set and the cluster after using FCM clustering. The intersection between every class and cluster is reliable samples which we are looking for. The experiment result demonstrates that the superiority of the proposed algorithm is remarkable.
基金This paper received financial support towards the cost of its publication from the Deanship of Research and Graduate Studies at Applied Science University, Amman, Jordan.
文摘In this paper, a new shape classification system based on singular value decomposition (SVD) transform using nearest neighbour classifier was proposed. The gray scale image of the shape object was converted into a black and white image. The squared Euclidean distance transform on binary image was applied to extract the boundary image of the shape. SVD transform features were extracted from the the boundary of the object shapes. In this paper, the proposed classification system based on SVD transform feature extraction method was compared with classifier based on moment invariants using nearest neighbour classifier. The experimental results showed the advantage of our proposed classification system.
文摘Marginal Fisher analysis (MFA) is a repre- sentative margin-based learning algorithm for face recognition. A major problem in MFA is how to select appropriate parameters, k1 and k2, to construct the respective intrinsic and penalty graphs. In this paper, we propose a novel method called nearest-neighbor (NN) classifier motivated marginal discriminant projections (NN-MDP). Motivated by the NN classifier, NN-MDP seeks a few projection vectors to prevent data samples from being wrongly categorized. Like MFA, NN-MDP can characterize the compactness and separability of samples simultaneously. Moreover, in contrast to MFA, NN-MDP can actively construct the intrinsic graph and penalty graph without unknown parameters. Experimental results on the 0RL, Yale, and FERET face databases show that NN-MDP not only avoids the intractability, and high expense of neighborhood parameter selection, but is also more applicable to face recognition with NN classifier than other methods.
文摘Confidence value plays a vital role in the decision of rejection threshold and the integration of multiple classifiers. Nearest neighbor (NN) classifier is the most traditional and most common nonparameter statistical pattern classifier. However, so far there is no explicate theoretical analysis of the connection between nearest distance and confidence value. An analytical insight into different approximations is presented and one formula is pointed out that it is optimal in the sense of mathematical expectation. Practice in handwritten numeral recognition strongly supports the conclusion.