The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current re...The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current research on image recognition of fraudulent websites is mainly carried out at the level of image feature extraction and similarity study,which have such disadvantages as difficulty in obtaining image data,insufficient image analysis,and single identification types.This study develops a model based on the entropy method for image leader decision and Inception-v3 transfer learning to address these disadvantages.The data processing part of the model uses a breadth search crawler to capture the image data.Then,the information in the images is evaluated with the entropy method,image weights are assigned,and the image leader is selected.In model training and prediction,the transfer learning of the Inception-v3 model is introduced into image recognition of fraudulent websites.Using selected image leaders to train the model,multiple types of fraudulent websites are identified with high accuracy.The experiment proves that this model has a superior accuracy in recognizing images on fraudulent websites compared to other current models.展开更多
Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing webs...Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing website detection.Using artificial intelligence,the project aims to provide efficient techniques for locating and thwarting these dangerous websites.The study goals were attained by performing a thorough literature analysis to investigate several models and methods often used in phishing website identification.Logistic Regression,K-Nearest Neighbors,Decision Trees,Random Forests,Support Vector Classifiers,Linear Support Vector Classifiers,and Naive Bayes were all used in the inquiry.This research covers the benefits and drawbacks of several Machine Learning approaches,illuminating how well-suited each is to overcome the difficulties in locating and countering phishing website predictions.The insights gained from this literature review guide the selection and implementation of appropriate models and methods in future research and real-world applications related to phishing detections.The study evaluates and compares accuracy,precision and recalls of several machine learning models in detecting phishing website URL’s detection.展开更多
In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act. The...In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act. The currentmost effective trend to tackle this problem is believed to be blocking thosewebsites, particularly through affiliated government bodies. To do so, aneffective detection mechanism is a necessary first step. Some researchers haveused various approaches to analyze the possible common features of suspectedpiracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, theseadvertisements have some common attributes that make them unique ascompared to advertisements posted on normal or legitimate websites. Theyusually encompass keywords such as click-words (words that redirect to installmalicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key featuresin the process of successfully detecting websites involved in the act of copyrightinfringement. Research has been conducted to identify advertisements servedon suspected piracy websites. However, these studies use a static approachthat relies mainly on manual scanning for the aforementioned keywords. Thisbrings with it some limitations, particularly in coping with the dynamic andever-changing behavior of advertisements posted on these websites. Therefore,we propose a technique that can continuously fine-tune itself and is intelligentenough to effectively identify advertisement (Ad) banners extracted fromsuspected piracy websites. We have done this by leveraging the power ofmachine learning algorithms, particularly the support vector machine with theword2vec word-embedding model. After applying the proposed technique to1015 Ad banners collected from 98 suspected piracy websites and 90 normal orlegitimate websites, we were able to successfully identify Ad banners extractedfrom suspected piracy websites with an accuracy of 97%. We present thistechnique with the hope that it will be a useful tool for various effective piracywebsite detection approaches. To our knowledge, this is the first approachthat uses machine learning to identify Ad banners served on suspected piracywebsites.展开更多
The feature analysis of fraudulent websites is of great significance to the combat,prevention and control of telecom fraud crimes.Aiming to address the shortcomings of existing analytical approaches,i.e.single dimensi...The feature analysis of fraudulent websites is of great significance to the combat,prevention and control of telecom fraud crimes.Aiming to address the shortcomings of existing analytical approaches,i.e.single dimension and venerability to anti-reconnaissance,this paper adopts the Stacking,the ensemble learning algorithm,combines multiple modalities such as text,image and URL,and proposes a multimodal fraudulent website identification method by ensembling heterogeneous models.Crossvalidation is first used in the training of multiple largely different base classifiers that are strong in learning,such as BERT model,residual neural network(ResNet)and logistic regression model.Classification of the text,image and URL features are then performed respectively.The results of the base classifiers are taken as the input of the meta-classifier,and the output of which is eventually used as the final identification.The study indicates that the fusion method is more effective in identifying fraudulent websites than the single-modal method,and the recall is increased by at least 1%.In addition,the deployment of the algorithm to the real Internet environment shows the improvement of the identification accuracy by at least 1.9%compared with other fusion methods.展开更多
Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phis...Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.展开更多
This paper analyzes users’ trust decision patterns for detecting phishing sites. Our previous work proposed HumanBoost [1] which improves the accuracy of detecting phishing sites by using users’ Past Trust Decisions...This paper analyzes users’ trust decision patterns for detecting phishing sites. Our previous work proposed HumanBoost [1] which improves the accuracy of detecting phishing sites by using users’ Past Trust Decisions (PTDs). Web users are generally required to make trust decisions whenever their personal information is requested by a website. Human-Boostassumed that a database of Web user’s PTD would be transformed into a binary vector, representing phishing or not-phishing, and the binary vector can be used for detecting phishing sites, similar to the existing heuristics. Here, this paper explores the types of the users whose PTDs are useful by running a subject experiment, where 309 participants- browsed 40 websites, judged whether the site appeared to be a phishing site, and described the criterion while assessing the credibility of the site. Based on the result of the experiment, this paper classifies the participants into eight groups by clustering approach and evaluates the detection accuracy for each group. It then clarifies the types of the users who can make suitable trust decisions for HumanBoost.展开更多
随着科技的进步和人们对于教育的愈发重视,智慧校园已成为教育领域的热门话题。基于此,对基于Vue和Spring Boot的大学智慧校园学习交流与交易网站进行了研究,通过IDEA、Visual Studio Code开发工具实现前后端数据交互。该网站分为学习...随着科技的进步和人们对于教育的愈发重视,智慧校园已成为教育领域的热门话题。基于此,对基于Vue和Spring Boot的大学智慧校园学习交流与交易网站进行了研究,通过IDEA、Visual Studio Code开发工具实现前后端数据交互。该网站分为学习区、跳蚤市场区和生活交流区等多个分区。用户可以在网站上发布帖子,进行交流和交易。该网站为大学生提供一个方便、快捷、安全、可信的校园互动平台。展开更多
As increasing numbers of Chinese language learners choose to learn English online,there is a need to investigate popular websites and their language learning designs.This paper reports on the first stage of a study th...As increasing numbers of Chinese language learners choose to learn English online,there is a need to investigate popular websites and their language learning designs.This paper reports on the first stage of a study that analyzed the pedagogical,linguistic,and content features of 25 Chinese English Language Learning(ELL)websites ranked according to their value and importance to users.The website ranking was undertaken using a system known as PageRank.The aim of the study was to identify the features characterizing popular sites as opposed to those of less popular sites for the purpose of producing a framework for ELL website design in the Chinese context.The study found that a pedagogical focus with developmental instructional materials accommodating diverse proficiency levels was a major contributor to website popularity.Chinese language use for translations and teaching directives and intermediate level English for learning materials were also significant features. Content topics included Anglophone/Western and non-Anglophone/Eastern contexts. Overall, popular websites were distinguished by their mediation of access to and scaffolded support for ELL.展开更多
网站指纹识别技术通过分析流量特征判断用户访问的网站站点,能够有效监管TOR匿名网络的用户行为。现有的识别方法通常需要大规模的数据样本以获得高的识别准确率,且普遍存在概念漂移问题。针对以上问题,本文提出一种基于残差和协作对抗...网站指纹识别技术通过分析流量特征判断用户访问的网站站点,能够有效监管TOR匿名网络的用户行为。现有的识别方法通常需要大规模的数据样本以获得高的识别准确率,且普遍存在概念漂移问题。针对以上问题,本文提出一种基于残差和协作对抗网络(Residual network and Collaborative and Adversarial Network,Re s-CAN)的网站指纹识别模型。该模型使用残差网络(Residual network)作为特征提取器以减少网络的优化难度。同时,将协作对抗网络(Collaborative and Adversarial Network,CAN)应用于网站指纹识别问题,使得特征提取器同时学习领域相关和领域无关特征,实现源域与目标域的特征空间对齐。实验结果表明,本文提出的方法在小样本环境下网站指纹识别准确率达到91.2%,优于现有的利用对抗领域自适应网络(Domain-Adversarial Neural Networks,DANN)迁移学习方法,且抗概念漂移能力较高。展开更多
基金supported by the National Social Science Fund of China(23BGL272)。
文摘The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current research on image recognition of fraudulent websites is mainly carried out at the level of image feature extraction and similarity study,which have such disadvantages as difficulty in obtaining image data,insufficient image analysis,and single identification types.This study develops a model based on the entropy method for image leader decision and Inception-v3 transfer learning to address these disadvantages.The data processing part of the model uses a breadth search crawler to capture the image data.Then,the information in the images is evaluated with the entropy method,image weights are assigned,and the image leader is selected.In model training and prediction,the transfer learning of the Inception-v3 model is introduced into image recognition of fraudulent websites.Using selected image leaders to train the model,multiple types of fraudulent websites are identified with high accuracy.The experiment proves that this model has a superior accuracy in recognizing images on fraudulent websites compared to other current models.
文摘Phishing websites present a severe cybersecurity risk since they can lead to financial losses,data breaches,and user privacy violations.This study uses machine learning approaches to solve the problem of phishing website detection.Using artificial intelligence,the project aims to provide efficient techniques for locating and thwarting these dangerous websites.The study goals were attained by performing a thorough literature analysis to investigate several models and methods often used in phishing website identification.Logistic Regression,K-Nearest Neighbors,Decision Trees,Random Forests,Support Vector Classifiers,Linear Support Vector Classifiers,and Naive Bayes were all used in the inquiry.This research covers the benefits and drawbacks of several Machine Learning approaches,illuminating how well-suited each is to overcome the difficulties in locating and countering phishing website predictions.The insights gained from this literature review guide the selection and implementation of appropriate models and methods in future research and real-world applications related to phishing detections.The study evaluates and compares accuracy,precision and recalls of several machine learning models in detecting phishing website URL’s detection.
基金This research project was supported by the Ministry of Culture,Sports,and Tourism(MCST)and the Korea Copyright Commission in 2021(2019-PF-9500).
文摘In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act. The currentmost effective trend to tackle this problem is believed to be blocking thosewebsites, particularly through affiliated government bodies. To do so, aneffective detection mechanism is a necessary first step. Some researchers haveused various approaches to analyze the possible common features of suspectedpiracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, theseadvertisements have some common attributes that make them unique ascompared to advertisements posted on normal or legitimate websites. Theyusually encompass keywords such as click-words (words that redirect to installmalicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key featuresin the process of successfully detecting websites involved in the act of copyrightinfringement. Research has been conducted to identify advertisements servedon suspected piracy websites. However, these studies use a static approachthat relies mainly on manual scanning for the aforementioned keywords. Thisbrings with it some limitations, particularly in coping with the dynamic andever-changing behavior of advertisements posted on these websites. Therefore,we propose a technique that can continuously fine-tune itself and is intelligentenough to effectively identify advertisement (Ad) banners extracted fromsuspected piracy websites. We have done this by leveraging the power ofmachine learning algorithms, particularly the support vector machine with theword2vec word-embedding model. After applying the proposed technique to1015 Ad banners collected from 98 suspected piracy websites and 90 normal orlegitimate websites, we were able to successfully identify Ad banners extractedfrom suspected piracy websites with an accuracy of 97%. We present thistechnique with the hope that it will be a useful tool for various effective piracywebsite detection approaches. To our knowledge, this is the first approachthat uses machine learning to identify Ad banners served on suspected piracywebsites.
基金supported by Zhejiang Provincial Natural Science Foundation of China(Grant No.LGF20G030001)Ministry of Public Security Science and Technology Plan Project(2022LL16)Key scientific research projects of agricultural and social development in Hangzhou in 2020(202004A06).
文摘The feature analysis of fraudulent websites is of great significance to the combat,prevention and control of telecom fraud crimes.Aiming to address the shortcomings of existing analytical approaches,i.e.single dimension and venerability to anti-reconnaissance,this paper adopts the Stacking,the ensemble learning algorithm,combines multiple modalities such as text,image and URL,and proposes a multimodal fraudulent website identification method by ensembling heterogeneous models.Crossvalidation is first used in the training of multiple largely different base classifiers that are strong in learning,such as BERT model,residual neural network(ResNet)and logistic regression model.Classification of the text,image and URL features are then performed respectively.The results of the base classifiers are taken as the input of the meta-classifier,and the output of which is eventually used as the final identification.The study indicates that the fusion method is more effective in identifying fraudulent websites than the single-modal method,and the recall is increased by at least 1%.In addition,the deployment of the algorithm to the real Internet environment shows the improvement of the identification accuracy by at least 1.9%compared with other fusion methods.
文摘Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.
文摘This paper analyzes users’ trust decision patterns for detecting phishing sites. Our previous work proposed HumanBoost [1] which improves the accuracy of detecting phishing sites by using users’ Past Trust Decisions (PTDs). Web users are generally required to make trust decisions whenever their personal information is requested by a website. Human-Boostassumed that a database of Web user’s PTD would be transformed into a binary vector, representing phishing or not-phishing, and the binary vector can be used for detecting phishing sites, similar to the existing heuristics. Here, this paper explores the types of the users whose PTDs are useful by running a subject experiment, where 309 participants- browsed 40 websites, judged whether the site appeared to be a phishing site, and described the criterion while assessing the credibility of the site. Based on the result of the experiment, this paper classifies the participants into eight groups by clustering approach and evaluates the detection accuracy for each group. It then clarifies the types of the users who can make suitable trust decisions for HumanBoost.
文摘随着科技的进步和人们对于教育的愈发重视,智慧校园已成为教育领域的热门话题。基于此,对基于Vue和Spring Boot的大学智慧校园学习交流与交易网站进行了研究,通过IDEA、Visual Studio Code开发工具实现前后端数据交互。该网站分为学习区、跳蚤市场区和生活交流区等多个分区。用户可以在网站上发布帖子,进行交流和交易。该网站为大学生提供一个方便、快捷、安全、可信的校园互动平台。
文摘As increasing numbers of Chinese language learners choose to learn English online,there is a need to investigate popular websites and their language learning designs.This paper reports on the first stage of a study that analyzed the pedagogical,linguistic,and content features of 25 Chinese English Language Learning(ELL)websites ranked according to their value and importance to users.The website ranking was undertaken using a system known as PageRank.The aim of the study was to identify the features characterizing popular sites as opposed to those of less popular sites for the purpose of producing a framework for ELL website design in the Chinese context.The study found that a pedagogical focus with developmental instructional materials accommodating diverse proficiency levels was a major contributor to website popularity.Chinese language use for translations and teaching directives and intermediate level English for learning materials were also significant features. Content topics included Anglophone/Western and non-Anglophone/Eastern contexts. Overall, popular websites were distinguished by their mediation of access to and scaffolded support for ELL.
文摘网站指纹识别技术通过分析流量特征判断用户访问的网站站点,能够有效监管TOR匿名网络的用户行为。现有的识别方法通常需要大规模的数据样本以获得高的识别准确率,且普遍存在概念漂移问题。针对以上问题,本文提出一种基于残差和协作对抗网络(Residual network and Collaborative and Adversarial Network,Re s-CAN)的网站指纹识别模型。该模型使用残差网络(Residual network)作为特征提取器以减少网络的优化难度。同时,将协作对抗网络(Collaborative and Adversarial Network,CAN)应用于网站指纹识别问题,使得特征提取器同时学习领域相关和领域无关特征,实现源域与目标域的特征空间对齐。实验结果表明,本文提出的方法在小样本环境下网站指纹识别准确率达到91.2%,优于现有的利用对抗领域自适应网络(Domain-Adversarial Neural Networks,DANN)迁移学习方法,且抗概念漂移能力较高。