In order to effectively detect malicious phishing behaviors, a phishing detection method based on the uniform resource locator (URL) features is proposed. First, the method compares the phishing URLs with legal ones...In order to effectively detect malicious phishing behaviors, a phishing detection method based on the uniform resource locator (URL) features is proposed. First, the method compares the phishing URLs with legal ones to extract the features of phishing URLs. Then a machine learning algorithm is applied to obtain the URL classification model from the sample data set training. In order to adapt to the change of a phishing URL, the classification model should be constantly updated according to the new samples. So, an incremental learning algorithm based on the feedback of the original sample data set is designed. The experiments verify that the combination of the URL features extracted in this paper and the support vector machine (SVM) classification algorithm can achieve a high phishing detection accuracy, and the incremental learning algorithm is also effective.展开更多
Phishing is one of the simplest ways in cybercrime to hack the reliable data of users such as passwords,account identifiers,bank details,etc.In general,these kinds of cyberattacks are made at users through phone calls...Phishing is one of the simplest ways in cybercrime to hack the reliable data of users such as passwords,account identifiers,bank details,etc.In general,these kinds of cyberattacks are made at users through phone calls,emails,or instant messages.The anti-phishing techniques,currently under use,aremainly based on source code features that need to scrape the webpage content.In third party services,these techniques check the classification procedure of phishing Uniform Resource Locators(URLs).Even thoughMachine Learning(ML)techniques have been lately utilized in the identification of phishing,they still need to undergo feature engineering since the techniques are not well-versed in identifying phishing offenses.The tremendous growth and evolution of Deep Learning(DL)techniques paved the way for increasing the accuracy of classification process.In this background,the current research article presents a Hunger Search Optimization with Hybrid Deep Learning enabled Phishing Detection and Classification(HSOHDL-PDC)model.The presented HSOHDL-PDC model focuses on effective recognition and classification of phishing based on website URLs.In addition,SOHDL-PDC model uses character-level embedding instead of word-level embedding since the URLs generally utilize words with no importance.Moreover,a hybrid Convolutional Neural Network-Long Short Term Memory(HCNN-LSTM)technique is also applied for identification and classification of phishing.The hyperparameters involved in HCNN-LSTM model are optimized with the help of HSO algorithm which in turn produced improved outcomes.The performance of the proposed HSOHDL-PDC model was validated using different datasets and the outcomes confirmed the supremacy of the proposed model over other recent approaches.展开更多
Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter ...Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter data analyzing or user querying. This paper demonstrates that the popular algorithms utilized at the process of focused web crawling, basically refer to webpage analyzing algorithms and crawling strategies (prioritize the uniform resource locator (URLs) in the queue). Advantages and disadvantages of three crawling strategies are shown in the first experiment, which indicates that the best-first search with an appropriate heuristics is a smart choice for topic-oriented crawlingwhile the depth-first search is helpless in focused crawling. Besides, another experiment on comparison of improved ones (with a webpage analyzing algorithm added) is carried out to verify that crawling strategies alone are not quite efficient for focused crawling and in most cases their mutual efforts are taken into consideration. In light of the experiment results and recent researches, some points on the research tendency of focused crawler algorithms are suggested.展开更多
基金The National Basic Research Program of China(973 Program)(No.2010CB328104,2009CB320501)the National Natural Science Foundation of China(No.61272531,61070158,61003257,61060161,61003311,41201486)+4 种基金the National Key Technology R&D Program during the11th Five-Year Plan Period(No.2010BAI88B03)Specialized Research Fund for the Doctoral Program of Higher Education(No.20110092130002)the National Science and Technology Major Project(No.2009ZX03004-004-04)the Foundation of the Key Laboratory of Netw ork and Information Security of Jiangsu Province(No.BM2003201)the Key Laboratory of Computer Netw ork and Information Integration of the Ministry of Education of China(No.93K-9)
文摘In order to effectively detect malicious phishing behaviors, a phishing detection method based on the uniform resource locator (URL) features is proposed. First, the method compares the phishing URLs with legal ones to extract the features of phishing URLs. Then a machine learning algorithm is applied to obtain the URL classification model from the sample data set training. In order to adapt to the change of a phishing URL, the classification model should be constantly updated according to the new samples. So, an incremental learning algorithm based on the feedback of the original sample data set is designed. The experiments verify that the combination of the URL features extracted in this paper and the support vector machine (SVM) classification algorithm can achieve a high phishing detection accuracy, and the incremental learning algorithm is also effective.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number(158/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R135)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:22UQU4340237DSR22.
文摘Phishing is one of the simplest ways in cybercrime to hack the reliable data of users such as passwords,account identifiers,bank details,etc.In general,these kinds of cyberattacks are made at users through phone calls,emails,or instant messages.The anti-phishing techniques,currently under use,aremainly based on source code features that need to scrape the webpage content.In third party services,these techniques check the classification procedure of phishing Uniform Resource Locators(URLs).Even thoughMachine Learning(ML)techniques have been lately utilized in the identification of phishing,they still need to undergo feature engineering since the techniques are not well-versed in identifying phishing offenses.The tremendous growth and evolution of Deep Learning(DL)techniques paved the way for increasing the accuracy of classification process.In this background,the current research article presents a Hunger Search Optimization with Hybrid Deep Learning enabled Phishing Detection and Classification(HSOHDL-PDC)model.The presented HSOHDL-PDC model focuses on effective recognition and classification of phishing based on website URLs.In addition,SOHDL-PDC model uses character-level embedding instead of word-level embedding since the URLs generally utilize words with no importance.Moreover,a hybrid Convolutional Neural Network-Long Short Term Memory(HCNN-LSTM)technique is also applied for identification and classification of phishing.The hyperparameters involved in HCNN-LSTM model are optimized with the help of HSO algorithm which in turn produced improved outcomes.The performance of the proposed HSOHDL-PDC model was validated using different datasets and the outcomes confirmed the supremacy of the proposed model over other recent approaches.
基金supported by the Research Fund for International Young Scientists of National Natural Science Foundation of China under Grant No.61550110248Tibet Autonomous Region Key Scientific Research Projects under Grant No.Z2014A18G2-13
文摘Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter data analyzing or user querying. This paper demonstrates that the popular algorithms utilized at the process of focused web crawling, basically refer to webpage analyzing algorithms and crawling strategies (prioritize the uniform resource locator (URLs) in the queue). Advantages and disadvantages of three crawling strategies are shown in the first experiment, which indicates that the best-first search with an appropriate heuristics is a smart choice for topic-oriented crawlingwhile the depth-first search is helpless in focused crawling. Besides, another experiment on comparison of improved ones (with a webpage analyzing algorithm added) is carried out to verify that crawling strategies alone are not quite efficient for focused crawling and in most cases their mutual efforts are taken into consideration. In light of the experiment results and recent researches, some points on the research tendency of focused crawler algorithms are suggested.