Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to ...Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration;these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes.展开更多
SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detect...SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.展开更多
文摘Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration;these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes.
文摘SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.