摘要
近年来,因特网上有大量包含非法或者不健康信息的网站,对非法网站进行过滤尤为重要。通常的做法是利用网页中记载的信息对网站进行分类,论文提出一种基于N-gram的朴素贝叶斯分类器,利用网站的域名对网站进行分类。作者采用该方法来自动识别包含不健康信息或非法信息的网站,实验结果证明,该方法具有相当的准确度。目前,该方法已经应用到某软件公司的网络防火墙产品中。
Nowada ys ,there are a lot of websites including illegal information among internet.It i s very important to filter such illegal websites.The common way is to classify web sites according to the information in their web pages.In this paper,the a uthors present a Naive Bayes classifier based on N-gram algorithm,which class ifies web sites according to their domain names.The result have proved its high accuracy when deploying it to classify illegal web sites.This tech-nology has been realized in a firewall product of a software company now.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第14期170-172,共3页
Computer Engineering and Applications
关键词
文本分类
信息过滤
Text Classification,Information Filtering