期刊文献+

一种适用于大规模网页分类的快速算法

A FAST ALGORITHM FOR LARGE SCALE WEB PAGE CLASSIFICATION
下载PDF
导出
摘要 网页分类中存在类别多、训练样本少等问题,一般分类器训练应用效果不佳。为了解决这个问题,提出基于类中心的统计学习方法。在较少人工标注网页的训练集情况下,此方法能取得很好的分类性能并且大幅度加快训练时间,并可以通过加入网页层次目录信息提升推理速度。在第一届LSHTC评测数据集上进行实验,结果表明:基于类中心的统计学习方法拥有较快的训练以及推理速度,并且在正确率上有很强的竞争力。 There are such problems in web page classification as involving too many categories and too few training samples, so that normal classifiers perform poor in applications. To solve the problem, centroid-based classification method is presented. Centroid-based algorithm not only achieves very good classification performance with fewer manual annotation tags, but also significantly improves training speed and prediction speed by adding web page hierarchical category information. By comparing with other methods that participated in 1 st LSHTC evaluation, experimental results show that centroid-based algorithm can get a very fast training and prediction speed with competitive accuracy.
出处 《计算机应用与软件》 CSCD 北大核心 2012年第7期260-263,281,共5页 Computer Applications and Software
关键词 类中心 文本分类 统计学习 Centroid-based Text classification Statistic learning
  • 引文网络
  • 相关文献

参考文献12

  • 1Cristianini N, Shawe-Taylor J. Art introduction to support Vector Machines: and other kernel-based learning methods [ M ]. Cambridge Univ, 2000.
  • 2Kim H, Howland P, Park H. Dimension reduction in text classification with support vector machines [ J ]. Mach. Learn. Res. 2005 ( 6 ) : 37 -53.
  • 3Rosenblatt F. The perceptron : A probabilistic model tbr information storage and organization in the brain [ J ]. Psychological review, 1958,65 (6) :386-408.
  • 4Guan H, Zhou J, Guo M. A class-feature-centroid classifier for text categorization [ C ]//Proceedings of the 18th international conference on World wide web,2009:201 - 210.
  • 5Cai L, Hofmann T. Hierarchical ,document categorization with support vector machines [ C ]//Proceedings of the thirteenth ACM international conference on Information and knowledge management,2004:78 -87.
  • 6Liu T, Yang Y,Wan H,et al. Support vector machines classification with a very large-scale taxonomy [ J 1. ACM SIGKDD Explorations Newsletter,2005,7( 1 ) :43.
  • 7Boutell M, Luo J, Shen X, et al. Learning multi-label scene classification[ J ]. Pattern Recognition,2004 : 1757 - 1771.
  • 8Sahon G,Wong A,Yang C. A vector space model for automatic indexing[J]. Communications of the ACM,1975,18( 11 ) :613 -620.
  • 9Cai L, Hofmann T. Exploiting known taxonomies in learning overlapping concepts [ C ]//Proceedings of International Joint Conferences on Artificial Intelligence ,2007.
  • 10Kosmopoulosy A, Paliourasy E G, Aseervatham S. The Large Scale Hierarchical Text Classification PASCAL Challenge [ R ]. The Open University in Milton Keynes, UK,2010.
;
使用帮助 返回顶部