Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we ...Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.展开更多
Volunteered geographic information(VGI)can be considered a subset of crowdsourced data(CSD)and its popularity has recently increased in a number of application areas.Disaster management is one of its key application a...Volunteered geographic information(VGI)can be considered a subset of crowdsourced data(CSD)and its popularity has recently increased in a number of application areas.Disaster management is one of its key application areas in which the benefits of VGI and CSD are potentially very high.However,quality issues such as credibility,reliability and relevance are limiting many of the advantages of utilising CSD.Credibility issues arise as CSD come from a variety of heterogeneous sources including both professionals and untrained citizens.VGI and CSD are also highly unstructured and the quality and metadata are often undocumented.In the 2011 Australian floods,the general public and disaster management administrators used the Ushahidi Crowd-mapping platform to extensively communicate flood-related information including hazards,evacuations,emergency services,road closures and property damage.This study assessed the credibility of the Australian Broadcasting Corporation’s Ushahidi CrowdMap dataset using a Naïve Bayesian network approach based on models commonly used in spam email detection systems.The results of the study reveal that the spam email detection approach is potentially useful for CSD credibility detection with an accuracy of over 90%using a forced classification methodology.展开更多
文摘Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.
基金Authors wish to acknowledge the Australian Government for providing support for the research work through the Research Training Program(RTP)and Monique Potts,ABC–Australia for providing the 2011 Australian Flood’s Ushahidi Crowdmap data.
文摘Volunteered geographic information(VGI)can be considered a subset of crowdsourced data(CSD)and its popularity has recently increased in a number of application areas.Disaster management is one of its key application areas in which the benefits of VGI and CSD are potentially very high.However,quality issues such as credibility,reliability and relevance are limiting many of the advantages of utilising CSD.Credibility issues arise as CSD come from a variety of heterogeneous sources including both professionals and untrained citizens.VGI and CSD are also highly unstructured and the quality and metadata are often undocumented.In the 2011 Australian floods,the general public and disaster management administrators used the Ushahidi Crowd-mapping platform to extensively communicate flood-related information including hazards,evacuations,emergency services,road closures and property damage.This study assessed the credibility of the Australian Broadcasting Corporation’s Ushahidi CrowdMap dataset using a Naïve Bayesian network approach based on models commonly used in spam email detection systems.The results of the study reveal that the spam email detection approach is potentially useful for CSD credibility detection with an accuracy of over 90%using a forced classification methodology.