摘要
为了提高网页文本分类的准确性.克服传统的文本分类算法易受网页中虚假、错误信息的影响.提出一种基于链接信息的网页分类算法.通过对K近邻方法的改进.利用当前网页与其父网页的链接信息对网页实沲分类,用空间向量表示待分类网页的父链接信息。在训练集合中找到K篇与该网页链接信息向量最相似的网页,计算该网页所属的类别,通过实验与传统文本分类算法进行了对比,验证了该方法的有效性.
To improve the performance of webpages classification system, and overcome a large number of false, erroneous information filled in the webpages affect the traditional classification algorithms, this paper presents a web page classification algorithm based on link information. Based on the K Nearest Neighbor method, the webpages are classified by the links among webpages. In this paper, the webpage currently classified is presented by the link information of vector space, and find K webpages with the highest similarity to it in the training set, then it is classified to the proper category. We compare the method to traditional classification algorithms through experiments, and the results show that it's more effective.
出处
《微电子学与计算机》
CSCD
北大核心
2012年第6期108-112,共5页
Microelectronics & Computer
基金
国家自然科学基金项目(60373003)
河南工业大学校琏金项目(2006BSO009)
关键词
网页分类
类别
K近邻方法
链接信息分类
webpage classification
category
K-nearest Neighbor
link information classification