期刊文献+

基于多分类器组合择优方法的主题爬行分类策略

Classification Strategy for Focus Crawling Based on Multi-classifier Combination and Ranking Approach
原文传递
导出
摘要 针对主题爬行技术中的单一分类算法在面对多主题Web抓取和分类需求时泛化能力不强的局限,设计一种利用多种强分类算法形成的分类器组合,主题爬行器根据当前主题任务在线评估并为分类器排名,从中选择最优分类器分类的策略,并开展在多个主题抓取任务下的分类实验,比较每种分类算法的准确率和组合后的平均分类准确率以及对分类效率等评价指标的综合分析,结果证明该策略对领域局域性有所克服,普适性较强。 For the limitation that generalization capacity of crawler is facing multi-topic Web crawling and classification, combination formed of multiple strong classification algorithms. online according to the current topic, and classifies Web pages single classification algorithm is not strong when focused the paper proposed a strategy of using multi-classifier The focused crawler evaluates and ranks the classifiers by selecting the better classifiers. Through classification experiments of multiple topics crawling tasks, comparing between accurate rate of each classification algorithm and average classification accurate rate of multi-classifier combination, and comprehensive analysis of the two indicators classification accuracy and classification efficiency, it proved the proposed method is better in universality, to a certain extent and overcomes the limitations of a single classifier.
作者 乔建忠
出处 《图书情报工作》 CSSCI 北大核心 2013年第14期114-120,共7页 Library and Information Service
关键词 主题爬行技术 主题爬行器 网页分类 分类算法 多分类器组合 分类准确率 分类效率 focused crawling focused crawler Web page classification classification algorithm multiple classifiers combination classification accuracy classification efficiency
  • 相关文献

参考文献16

  • 1Mitchell T M. Machine learning [ M ]. Columbus : The McGraw-Hill Companies Inc, 1997.
  • 2Nigam K, McCallum A, Thrun S, et al. Text classification from labeled and unlabelled documents using EM [ J ]. Machine Learning, 2000, 39(2-3) :103-134.
  • 3Rennie J, MeCaUum A. Using reinforcement learning to spider the Web efficiently [ C ]//Bratko I, Dzeroski S. Proceedings of the Sixteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1999:335 -343.
  • 4Diligenti M, Coetzee F, Lawrence S. Focused crawling using context graphs[ C]//Abbadi A E, Brodie M L, Chakravarthy S, et al. Proceedings of the 26th VLDB Conference. San Francisco: Morgan Kaufmann Publishers Inc, 2000:527 -534.
  • 5Chakrabarti S, Punera K, Subramanyam M. Accelerated focused crawling through online relevance feedback [ C ]//Proceedings of the 11 th International Conference on World Wide Web. New York : ACM ,2002 : 148 - 159.
  • 6Johnson J, Tsioutsiouliklis K, Giles C L. Evolving strategies for focused Web crawling [ C ]//Fawcett T, Mishra N. Proceedings of the Twentieth Intemational Conference. Washington D. C: AAAI Press, 2003:298 -305.
  • 7Pant G, Srinivasan P. Link contexts in classifier-guided topical crawlers [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 ( 1 ) : 107 - 122.
  • 8Pant G, Srinivasan P. Learning to craM:Comparing classification schemes [ J ]. ACM Trans Information Systems, 2005,23 (4) :430- 462.
  • 9杨建良,王永成.自动分类技术的发展与展望[EB/OL].[2013-06-06].http://www.cnindex.fudan.edu.cn/zgsy/2003nl/zidongfenlei.htm.
  • 10刘菊新,徐从富.基于多分类器组合模型的垃圾邮件过滤[J].计算机工程,2010,36(18):194-196. 被引量:2

二级参考文献29

共引文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部