摘要
分析分布式实时网络行为监控系统中Web网页安全性挖掘问题,设计实现一个基于Web挖掘的自动分类器,并构造一个实验环境来检测分类器的性能。该自动分类器利用特征提取算法实现对每个样本的特征向量提取和待分类文本的特征向量提取,利用基于k个"最近邻"(KNN)分类算法实现对网页的分类,能够提取出带有不安全信息的网页,分类效果良好。
This paper analyzes Web security mining problem in distributed real-time network behavior monitoring system. An auto classifier based on Web minning was designed and implemented. An experiment environment to constructed. This classfier extracts the feature test the performance of the classifier was vector of each samples and documents to be classified by using the feature extraction algorithm. Web page was classfied by using the K- Nearest-Neighbor(KNN) classification algorithm. The experimental results show that this auto classifier based on Web minning can fetch insecurity Web pages, and its classification is effective.
出处
《广西科学院学报》
2008年第4期310-312,316,共4页
Journal of Guangxi Academy of Sciences
基金
广西科技攻关项目(桂科攻关033008-9)资助
关键词
网络行为监控
Web网页挖掘
分类器
KNN分类算法
特征提取
network behavior monitoring, Web page minning, classifier, KNN classification algorithm, feature extraction