Malware is an ever-present and dynamic threat to networks and computer systems in cybersecurity,and because of its complexity and evasiveness,it is challenging to identify using traditional signature-based detection a...Malware is an ever-present and dynamic threat to networks and computer systems in cybersecurity,and because of its complexity and evasiveness,it is challenging to identify using traditional signature-based detection approaches.The study article discusses the growing danger to cybersecurity that malware hidden in PDF files poses,highlighting the shortcomings of conventional detection techniques and the difficulties presented by adversarial methodologies.The article presents a new method that improves PDF virus detection by using document analysis and a Logistic Model Tree.Using a dataset from the Canadian Institute for Cybersecurity,a comparative analysis is carried out with well-known machine learning models,such as Credal Decision Tree,Naïve Bayes,Average One Dependency Estimator,Locally Weighted Learning,and Stochastic Gradient Descent.Beyond traditional structural and JavaScript-centric PDF analysis,the research makes a substantial contribution to the area by boosting precision and resilience in malware detection.The use of Logistic Model Tree,a thorough feature selection approach,and increased focus on PDF file attributes all contribute to the efficiency of PDF virus detection.The paper emphasizes Logistic Model Tree’s critical role in tackling increasing cybersecurity threats and proposes a viable answer to practical issues in the sector.The results reveal that the Logistic Model Tree is superior,with improved accuracy of 97.46%when compared to benchmark models,demonstrating its usefulness in addressing the ever-changing threat landscape.展开更多
The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their ...The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.展开更多
基金This research work was funded by Institutional Fund Projects under Grant No.(IFPIP:211-611-1443).
文摘Malware is an ever-present and dynamic threat to networks and computer systems in cybersecurity,and because of its complexity and evasiveness,it is challenging to identify using traditional signature-based detection approaches.The study article discusses the growing danger to cybersecurity that malware hidden in PDF files poses,highlighting the shortcomings of conventional detection techniques and the difficulties presented by adversarial methodologies.The article presents a new method that improves PDF virus detection by using document analysis and a Logistic Model Tree.Using a dataset from the Canadian Institute for Cybersecurity,a comparative analysis is carried out with well-known machine learning models,such as Credal Decision Tree,Naïve Bayes,Average One Dependency Estimator,Locally Weighted Learning,and Stochastic Gradient Descent.Beyond traditional structural and JavaScript-centric PDF analysis,the research makes a substantial contribution to the area by boosting precision and resilience in malware detection.The use of Logistic Model Tree,a thorough feature selection approach,and increased focus on PDF file attributes all contribute to the efficiency of PDF virus detection.The paper emphasizes Logistic Model Tree’s critical role in tackling increasing cybersecurity threats and proposes a viable answer to practical issues in the sector.The results reveal that the Logistic Model Tree is superior,with improved accuracy of 97.46%when compared to benchmark models,demonstrating its usefulness in addressing the ever-changing threat landscape.
文摘The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.