SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detect...SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.展开更多
Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection ...Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.展开更多
Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamm...Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamming技术,包括termspaming、link spamming和隐藏技术三种类型。我们相信本文的分析对于开发恰当的反措施是非常有用的。展开更多
文摘SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.
文摘Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.