Pseudo NLP Joint Spam Classification Technique for Big Data Cluster 被引量：2

下载PDF

导出

摘要 Spam mail classification considered complex and error-prone task in the distributed computing environment.There are various available spam mail classification approaches such as the naive Bayesian classifier,logistic regression and support vector machine and decision tree,recursive neural network,and long short-term memory algorithms.However,they do not consider the document when analyzing spam mail content.These approaches use the bagof-words method,which analyzes a large amount of text data and classifies features with the help of term frequency-inverse document frequency.Because there are many words in a document,these approaches consume a massive amount of resources and become infeasible when performing classification on multiple associated mail documents together.Thus,spam mail is not classified fully,and these approaches remain with loopholes.Thus,we propose a term frequency topic inverse document frequency model that considers the meaning of text data in a larger semantic unit by applying weights based on the document’s topic.Moreover,the proposed approach reduces the scarcity problem through a frequency topic-inverse document frequency in singular value decomposition model.Our proposed approach also reduces the dimensionality,which ultimately increases the strength of document classification.Experimental evaluations show that the proposed approach classifies spam mail documents with higher accuracy using individual document-independent processing computation.Comparative evaluations show that the proposed approach performs better than the logistic regression model in the distributed computing environment,with higher document word frequencies of 97.05%,99.17%and 96.59%.

作者 WooHyun Park Nawab Muhammad Faseeh Qureshi Dong Ryeol Shin

机构地区 Department of Electrical and Computer Engineering Department of Computer Education

出处《Computers, Materials & Continua》 SCIE EI 2022年第4期517-535,共19页 计算机、材料和连续体（英文）

关键词 NLP big data machine learning TFT-IDF spam mail

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

同被引文献3

1Jiahui He,Chaozhi Wang,Hongyu Wu,Leiming Yan,Christian Lu.Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms[J].Journal of New Media,2019,1(2):51-61. 被引量：4
2Saleh Albahli,Ahmad Algsham,Shamsulhaq Aeraj,Muath Alsaeed,Muath Alrashed,Hafiz Tayyab Rauf,Muhammad Arif,Mazin Abed Mohammed.COVID-19 Public Sentiment Insights: A Text Mining Approach to the Gulf Countries[J].Computers, Materials & Continua,2021(5):1613-1627. 被引量：5
3Xiaorui Zhang,Xuan Chen,Wei Sun,Xiaozheng He.Vehicle Re-Identication Model Based on Optimized DenseNet121 with Joint Loss[J].Computers, Materials & Continua,2021(6):3933-3948. 被引量：12

引证文献2

1Woo Hyun Park,Isma Farah Siddiqui,Dong Ryeol Shin,Nawab Muhammad Faseeh Qureshi.NLP-Based Subject with Emotions Joint Analytics for Epidemic Articles[J].Computers, Materials & Continua,2022(11):2985-3001.
2Woo Hyun Park,Isma Farah Siddiqui,Nawab Muhammad Faseeh Qureshi.AI-Enabled Grouping Bridgehead to Secure Penetration Topics of Metaverse[J].Computers, Materials & Continua,2022(12):5609-5624. 被引量：1

二级引证文献1

1赵晨伊,赵欣.一种基于元宇宙的边缘端资源配置方案[J].计算机与现代化,2024(4):121-126.

1陆渊章,谢海燕,吉训生.基于多目标跟踪FANET网络的UAV协同通信算法研究[J].电子器件,2021,44(6):1399-1403. 被引量：1
2Xia Li,Zhuona Zhang,Keyang Lyu,Dongqun Xu.Strengthening Community Defenses to Prevent and Control the Spread of COVID-19 in China[J].China CDC weekly,2022,4(10):191-194. 被引量：1
3Jiansheng Wu,Yongsheng Xie.Hybrid Support Vector Regression with Parallel Co-Evolution Algorithm Based on GA and PSO for Forecasting Monthly Rainfall[J].Journal of Software Engineering and Applications,2019,12(12):524-539.
4Yang Su,Liping Lü,Weifeng Shen,Shun'an Wei.An efficient technique for improving methanol yield using dual C02 feeds and dry methane reforming[J].Frontiers of Chemical Science and Engineering,2020,14(4):614-628. 被引量：1
5Dhruvil Shah,Vishvesh Trivedi,Vinay Sheth,Aakash Shah,Uttam Chauhan.ResTS: Residual Deep interpretable architecture for plant disease detection[J].Information Processing in Agriculture,2022,9(2):212-223.
6Kazuhide Ito,Sung-Jun Yoo,Hirofumi Horata.Identification of Model Parameters of Vaporized Hydrogen Peroxide Decomposi-tion Flux on Building Materials for Compu-tational Fluid Dynamics[J].Journal of Environmental Protection,2016,7(2):212-229.
7汪定,邹云开,陶义,王彬.基于循环神经网络和生成式对抗网络的口令猜测模型研究[J].计算机学报,2021,44(8):1519-1534. 被引量：12
8Debasish Chakraborty,Sourabh Singh,Dibyendu Dutta.Segmentation and classification of high spatial resolution images based on Hölder exponents and variance[J].Geo-Spatial Information Science,2017,20(1):39-45.
9Jian Dai,Xin-hong Hao,Ze Li,Ping Li,Xiao-peng Yan.Adaptive target and jamming recognition for the pulse doppler radar fuze based on a time-frequency joint feature and an online-updated naive bayesian classifier with minimal risk[J].Defence Technology（防务技术）,2022,18(3):457-466. 被引量：6
10Jiesi Li,Ning Xu,Weizhi Nie,Shenyuan Zhang.Image Captioning with multi-level similarity-guided semantic matching[J].Visual Informatics,2021,5(4):41-48.

Computers, Materials & Continua

2022年第4期

浏览历史

内容加载中请稍等...

Pseudo NLP Joint Spam Classification Technique for Big Data Cluster 被引量：2

同被引文献3

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史