With the continuous development of e-commerce,consumers show increasing interest in posting comments on consumption experience and quality of commodities.Meanwhile,people make purchasing decisions relying on other com...With the continuous development of e-commerce,consumers show increasing interest in posting comments on consumption experience and quality of commodities.Meanwhile,people make purchasing decisions relying on other comments much more than ever before.So the reliability of commodity comments has a significant impact on ensuring consumers’equity and building a fair internet-trade-environment.However,some unscrupulous online-sellers write fake praiseful reviews for themselves and malicious comments for their business counterparts to maximize their profits.Those improper ways of self-profiting have severely ruined the entire online shopping industry.Aiming to detect and prevent these deceptive comments effectively,we construct a model of Multi-Filters Convolutional Neural Network(MFCNN)for opinion spam detection.MFCNN is designed with a fixed-length sequence input and an improved activation function to avoid the gradient vanishing problem in spam opinion detection.Moreover,convolution filters with different widths are used in MFCNN to represent the sentences and documents.Our experimental results show that MFCNN outperforms current state-of-the-art methods on standard spam detection benchmarks.展开更多
In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious pr...In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.展开更多
Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity ...Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity of online social spaces.Previous research aimed to find spammers based on hybrid approaches of graph mining,posted content,and metadata,using small and manually labeled datasets.However,such hybrid approaches are unscalable,not robust,particular dataset dependent,and require numerous parameters,complex graphs,and natural language processing(NLP)resources to make decisions,which makes spammer detection impractical for real-time detection.For example,graph mining requires neighbors’information,posted content-based approaches require multiple tweets from user profiles,then NLP resources to make decisions that are not applicable in a real-time environment.To fill the gap,firstly,we propose a REal-time Metadata based Spammer detection(REMS)model based on only metadata features to identify spammers,which takes the least number of parameters and provides adequate results.REMS is a scalable and robust model that uses only 19 metadata features of Twitter users to induce 73.81%F1-Score classification accuracy using a balanced training dataset(50%spam and 50%genuine users).The 19 features are 8 original and 11 derived features from the original features of Twitter users,identified with extensive experiments and analysis.Secondly,we present the largest and most diverse dataset of published research,comprising 211 K spam users and 1 million genuine users.The diversity of the dataset can be measured as it comprises users who posted 2.1 million Tweets on seven topics(100 hashtags)from 6 different geographical locations.The REMS’s superior classification performance with multiple machine and deep learning methods indicates that only metadata features have the potential to identify spammers rather than focusing on volatile posted content and complex graph structures.Dataset and REMS’s codes are available on GitHub(www.github.com/mhadnanali/REMS).展开更多
Spam has turned into a big predicament these days,due to the increase in the number of spam emails,as the recipient regularly receives piles of emails.Not only is spam wasting users’time and bandwidth.In addition,it ...Spam has turned into a big predicament these days,due to the increase in the number of spam emails,as the recipient regularly receives piles of emails.Not only is spam wasting users’time and bandwidth.In addition,it limits the storage space of the email box as well as the disk space.Thus,spam detection is a challenge for individuals and organizations alike.To advance spam email detection,this work proposes a new spam detection approach,using the grasshopper optimization algorithm(GOA)in training a multilayer perceptron(MLP)classifier for categorizing emails as ham and spam.Hence,MLP and GOA produce an artificial neural network(ANN)model,referred to(GOAMLP).Two corpora are applied Spam Base and UK-2011Web spam for this approach.Finally,the finding represents evidence that the proposed spam detection approach has achieved a better level in spam detection than the status of the art.展开更多
Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing webs...Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing website detection.Using artificial intelligence,the project aims to provide efficient techniques for locating and thwarting these dangerous websites.The study goals were attained by performing a thorough literature analysis to investigate several models and methods often used in phishing website identification.Logistic Regression,K-Nearest Neighbors,Decision Trees,Random Forests,Support Vector Classifiers,Linear Support Vector Classifiers,and Naive Bayes were all used in the inquiry.This research covers the benefits and drawbacks of several Machine Learning approaches,illuminating how well-suited each is to overcome the difficulties in locating and countering phishing website predictions.The insights gained from this literature review guide the selection and implementation of appropriate models and methods in future research and real-world applications related to phishing detections.The study evaluates and compares accuracy,precision and recalls of several machine learning models in detecting phishing website URL’s detection.展开更多
SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detect...SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.展开更多
Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Ins...Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.展开更多
Undoubtedly,spam is a serious problem,and the number of spam emails is increased rapidly.Besides,the massive number of spam emails prompts the need for spam detection techniques.Several methods and algorithms are used...Undoubtedly,spam is a serious problem,and the number of spam emails is increased rapidly.Besides,the massive number of spam emails prompts the need for spam detection techniques.Several methods and algorithms are used for spam filtering.Also,some emergent spam detection techniques use machine learning methods and feature extraction.Some methods and algorithms have been introduced for spam detecting and filtering.This research proposes two models for spam detection and feature selection.The first model is evaluated with the email spam classification dataset,which is based on reducing the number of keywords to its minimum.The results of this model are promising and highly acceptable.The second proposed model is based on creating features for spam detection as a first stage.Then,the number of features is reduced using three well-known metaheuristic algorithms at the second stage.The algorithms used in the second model are Artificial Bee Colony(ABC),Ant Colony Optimization(ACO),and Particle Swarm Optimization(PSO),and these three algorithms are adapted to fit the proposed model.Also,the authors give it the names AABC,AACO,and APSO,respectively.The dataset used for the evaluation of this model is Enron.Finally,well-known criteria are used for the evaluation purposes of this model,such as true positive,false positive,false negative,precision,recall,and F-Measure.The outcomes of the second proposed model are highly significant compared to the first one.展开更多
基金This work is supported by The National Key Research and Development Program of China(2018YFB1800202,2016YFB1000302,SQ2019ZD090149,2018YFB0204301).
文摘With the continuous development of e-commerce,consumers show increasing interest in posting comments on consumption experience and quality of commodities.Meanwhile,people make purchasing decisions relying on other comments much more than ever before.So the reliability of commodity comments has a significant impact on ensuring consumers’equity and building a fair internet-trade-environment.However,some unscrupulous online-sellers write fake praiseful reviews for themselves and malicious comments for their business counterparts to maximize their profits.Those improper ways of self-profiting have severely ruined the entire online shopping industry.Aiming to detect and prevent these deceptive comments effectively,we construct a model of Multi-Filters Convolutional Neural Network(MFCNN)for opinion spam detection.MFCNN is designed with a fixed-length sequence input and an improved activation function to avoid the gradient vanishing problem in spam opinion detection.Moreover,convolution filters with different widths are used in MFCNN to represent the sentences and documents.Our experimental results show that MFCNN outperforms current state-of-the-art methods on standard spam detection benchmarks.
基金supported by the National Science Foundation of China(No.61170145,61373081)the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20113704110001)+1 种基金the Technology and Development Project of Shandong(No.2013GGX10125)the Taishan Scholar Project of Shandong,China
文摘In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.
基金supported by the Guangzhou Government Project(Grant No.62216235)the National Natural Science Foundation of China(Grant Nos.61573328,622260-1).
文摘Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity of online social spaces.Previous research aimed to find spammers based on hybrid approaches of graph mining,posted content,and metadata,using small and manually labeled datasets.However,such hybrid approaches are unscalable,not robust,particular dataset dependent,and require numerous parameters,complex graphs,and natural language processing(NLP)resources to make decisions,which makes spammer detection impractical for real-time detection.For example,graph mining requires neighbors’information,posted content-based approaches require multiple tweets from user profiles,then NLP resources to make decisions that are not applicable in a real-time environment.To fill the gap,firstly,we propose a REal-time Metadata based Spammer detection(REMS)model based on only metadata features to identify spammers,which takes the least number of parameters and provides adequate results.REMS is a scalable and robust model that uses only 19 metadata features of Twitter users to induce 73.81%F1-Score classification accuracy using a balanced training dataset(50%spam and 50%genuine users).The 19 features are 8 original and 11 derived features from the original features of Twitter users,identified with extensive experiments and analysis.Secondly,we present the largest and most diverse dataset of published research,comprising 211 K spam users and 1 million genuine users.The diversity of the dataset can be measured as it comprises users who posted 2.1 million Tweets on seven topics(100 hashtags)from 6 different geographical locations.The REMS’s superior classification performance with multiple machine and deep learning methods indicates that only metadata features have the potential to identify spammers rather than focusing on volatile posted content and complex graph structures.Dataset and REMS’s codes are available on GitHub(www.github.com/mhadnanali/REMS).
文摘Spam has turned into a big predicament these days,due to the increase in the number of spam emails,as the recipient regularly receives piles of emails.Not only is spam wasting users’time and bandwidth.In addition,it limits the storage space of the email box as well as the disk space.Thus,spam detection is a challenge for individuals and organizations alike.To advance spam email detection,this work proposes a new spam detection approach,using the grasshopper optimization algorithm(GOA)in training a multilayer perceptron(MLP)classifier for categorizing emails as ham and spam.Hence,MLP and GOA produce an artificial neural network(ANN)model,referred to(GOAMLP).Two corpora are applied Spam Base and UK-2011Web spam for this approach.Finally,the finding represents evidence that the proposed spam detection approach has achieved a better level in spam detection than the status of the art.
文摘Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing website detection.Using artificial intelligence,the project aims to provide efficient techniques for locating and thwarting these dangerous websites.The study goals were attained by performing a thorough literature analysis to investigate several models and methods often used in phishing website identification.Logistic Regression,K-Nearest Neighbors,Decision Trees,Random Forests,Support Vector Classifiers,Linear Support Vector Classifiers,and Naive Bayes were all used in the inquiry.This research covers the benefits and drawbacks of several Machine Learning approaches,illuminating how well-suited each is to overcome the difficulties in locating and countering phishing website predictions.The insights gained from this literature review guide the selection and implementation of appropriate models and methods in future research and real-world applications related to phishing detections.The study evaluates and compares accuracy,precision and recalls of several machine learning models in detecting phishing website URL’s detection.
文摘SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.
基金supported by the National Natural Science Foundation of China under Grant No. 60873158the National Basic Research 973 Program of China under Grant No. 2010CB327902+1 种基金the Fundamental Research Funds for the Central Universities of Chinathe Opening Funding of the State Key Laboratory of Virtual Reality Technology and Systems of China
文摘Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.
文摘Undoubtedly,spam is a serious problem,and the number of spam emails is increased rapidly.Besides,the massive number of spam emails prompts the need for spam detection techniques.Several methods and algorithms are used for spam filtering.Also,some emergent spam detection techniques use machine learning methods and feature extraction.Some methods and algorithms have been introduced for spam detecting and filtering.This research proposes two models for spam detection and feature selection.The first model is evaluated with the email spam classification dataset,which is based on reducing the number of keywords to its minimum.The results of this model are promising and highly acceptable.The second proposed model is based on creating features for spam detection as a first stage.Then,the number of features is reduced using three well-known metaheuristic algorithms at the second stage.The algorithms used in the second model are Artificial Bee Colony(ABC),Ant Colony Optimization(ACO),and Particle Swarm Optimization(PSO),and these three algorithms are adapted to fit the proposed model.Also,the authors give it the names AABC,AACO,and APSO,respectively.The dataset used for the evaluation of this model is Enron.Finally,well-known criteria are used for the evaluation purposes of this model,such as true positive,false positive,false negative,precision,recall,and F-Measure.The outcomes of the second proposed model are highly significant compared to the first one.