摘要
针对基于链接关系的网页分类算法中存在噪声邻域网页干扰分类结果的问题,提出利用网页间的相似度进行优化的方法。为不同关系的满足相似度阈值的邻域网页分别设置不同的权值,同时结合支持向量机对网页的分类结果,计算得到网页的类别。实验表明,本文算法准确率、召回率和F1值均有所提高。
There is a problem in the Web page classification algorithm by using hyperlinks, the noise neighbors interfere with the results of the classification. To solve the problem an optimization method was presented, which utilizes the similarities between pages. If neighbors meet the thresholds, they are set different weights for different relationships. The results of classification by support vector machine are also used. Experiment shows that it increases in precision, recall and F1 value.
出处
《计算机与现代化》
2014年第5期14-17,23,共5页
Computer and Modernization
基金
国家级教学团队建设项目(00700054J1901)
关键词
网页分类
邻域网页
相似度
支持向量机
Web page classification
neighboring page
similarity
support vector machine