期刊文献+

一种基于域名的非法网站过滤技术 被引量:1

Filtering Illegal Website on Domain Name
下载PDF
导出
摘要 近年来,因特网上有大量包含非法或者不健康信息的网站,对非法网站进行过滤尤为重要。通常的做法是利用网页中记载的信息对网站进行分类,论文提出一种基于N-gram的朴素贝叶斯分类器,利用网站的域名对网站进行分类。作者采用该方法来自动识别包含不健康信息或非法信息的网站,实验结果证明,该方法具有相当的准确度。目前,该方法已经应用到某软件公司的网络防火墙产品中。 Nowada ys ,there are a lot of websites including illegal information among internet.It i s very important to filter such illegal websites.The common way is to classify web sites according to the information in their web pages.In this paper,the a uthors present a Naive Bayes classifier based on N-gram algorithm,which class ifies web sites according to their domain names.The result have proved its high accuracy when deploying it to classify illegal web sites.This tech-nology has been realized in a firewall product of a software company now.
出处 《计算机工程与应用》 CSCD 北大核心 2003年第14期170-172,共3页 Computer Engineering and Applications
关键词 文本分类 信息过滤 Text Classification,Information Filtering
  • 相关文献

参考文献7

  • 1[1]http://www.yahoo.com/
  • 2[2]David A Forsyth. Finding Naked People[C].In:European Conf. on Computer Vision, 1996;Ⅱ:592~602
  • 3[3]http://www.nlplab.com/software.htm
  • 4[4]Anthony Whitehead. Classifying Adult Content On the Web.http://www.scs.carleton.ca/~ morin/publications/porn/adult - submitted.ps.gz,2000
  • 5[5]Andrew McCallum,Kamal Nigam. A Comparison of Event Models for Naive Bayes Text Classification[C].In:AAAI-98 Workshop on "Learning for Text Categorization",1998
  • 6[6]T Joachims. Learning to Classify Text using Support Vector Machines [D].Dissertation. Kluwer ,2002
  • 7[7]Mladenic D.Machine Learning on non-homogeneous[D].distributed text data PhD thesis. University of Ljubljana,Slovenia, 1998-10

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部