Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with ...Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.展开更多
Since the worldwide spread of internet-connected devices and rapid advances made in Internet of Things(IoT)systems,much research has been done in using machine learning methods to recognize IoT sensors data.This is pa...Since the worldwide spread of internet-connected devices and rapid advances made in Internet of Things(IoT)systems,much research has been done in using machine learning methods to recognize IoT sensors data.This is particularly the case for optical character recognition of handwritten scripts.Recognizing text in images has several useful applications,including content-based image retrieval,searching and document archiving.The Arabic language is one of the mostly used tongues in the world.However,Arabic text recognition in imagery is still very much in the nascent stage,especially handwritten text.This is mainly due to the language complexities,different writing styles,variations in the shape of characters,diacritics,and connected nature of Arabic text.In this paper,two deep learning models were proposed.The first model was based on a sequence-to-sequence recognition,while the second model was based on a fully convolution network.To measure the performance of these models,a new dataset,called QTID(Quran Text Image Dataset)was devised.This is the first Arabic dataset that includes Arabic diacritics.It consists of 309,720 different 192×64 annotated Arabic word images,which comprise 2,494,428 characters in total taken from the Holy Quran.The annotated images in the dataset were randomly divided into 90%,5%,and 5%sets for training,validation,and testing purposes,respectively.Both models were set up to recognize the Arabic Othmani font in the QTID.Experimental results show that the proposed methods achieve state-of-the-art outcomes.Furthermore,the proposed models surpass expectations in terms of character recognition rate,F1-score,average precision,and recall values.They are superior to the best Arabic text recognition engines like Tesseract and ABBYY FineReader.展开更多
Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the a...Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the ancient scripts,but lack of standard dataset for such scripts is a major constraint.Although many scholars and researchers have captured and uploaded inscription images on various websites,manual searching,downloading and extraction of these images is tedious and error prone.Web search queries return a vast number of irrelevant results,and manually extracting images for a specific script is not scalable.This paper proposes a novelmultistage system to identify the specific set of script images from a large set of images downloaded from web sources.The proposed system combines the two most important pattern matching techniques-Scale Invariant Feature Transform(SIFT)and Template matching,in a sequential pipeline,and by using the key strengths of each technique,the system can discard irrelevant images while retaining a specific type of images.展开更多
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R513),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.
基金funded by the Australian Research Data Common(ARDC),project code—RG192500 that will be used for paying the APC of this manuscript.
文摘Since the worldwide spread of internet-connected devices and rapid advances made in Internet of Things(IoT)systems,much research has been done in using machine learning methods to recognize IoT sensors data.This is particularly the case for optical character recognition of handwritten scripts.Recognizing text in images has several useful applications,including content-based image retrieval,searching and document archiving.The Arabic language is one of the mostly used tongues in the world.However,Arabic text recognition in imagery is still very much in the nascent stage,especially handwritten text.This is mainly due to the language complexities,different writing styles,variations in the shape of characters,diacritics,and connected nature of Arabic text.In this paper,two deep learning models were proposed.The first model was based on a sequence-to-sequence recognition,while the second model was based on a fully convolution network.To measure the performance of these models,a new dataset,called QTID(Quran Text Image Dataset)was devised.This is the first Arabic dataset that includes Arabic diacritics.It consists of 309,720 different 192×64 annotated Arabic word images,which comprise 2,494,428 characters in total taken from the Holy Quran.The annotated images in the dataset were randomly divided into 90%,5%,and 5%sets for training,validation,and testing purposes,respectively.Both models were set up to recognize the Arabic Othmani font in the QTID.Experimental results show that the proposed methods achieve state-of-the-art outcomes.Furthermore,the proposed models surpass expectations in terms of character recognition rate,F1-score,average precision,and recall values.They are superior to the best Arabic text recognition engines like Tesseract and ABBYY FineReader.
文摘Analysis and recognition of ancient scripts is a challenging task as these scripts are inscribed on pillars,stones,or leaves.Optical recognition systems can help in preserving,sharing,and accelerate the study of the ancient scripts,but lack of standard dataset for such scripts is a major constraint.Although many scholars and researchers have captured and uploaded inscription images on various websites,manual searching,downloading and extraction of these images is tedious and error prone.Web search queries return a vast number of irrelevant results,and manually extracting images for a specific script is not scalable.This paper proposes a novelmultistage system to identify the specific set of script images from a large set of images downloaded from web sources.The proposed system combines the two most important pattern matching techniques-Scale Invariant Feature Transform(SIFT)and Template matching,in a sequential pipeline,and by using the key strengths of each technique,the system can discard irrelevant images while retaining a specific type of images.