期刊文献+

NQPC:一种新型的基于查询日志的网页分类方法

NQPC:novel query log-based web-page classification method
下载PDF
导出
摘要 网页分类可对海量网页进行分门别类,可应用于许多方面。现存的网页自动分类方法较多,其中常用的基于网页内容的方法由于网页内容的不纯,导致其存在较大的性能提升空间。基于查询日志,提出了一种新型的网页分类方法NQPC。该方法提出一种低维特征向量抽取方法,从而避免"维度灾难";基于优质的查询日志进行网页分类,查询日志相对网页内容而言,具有内容较纯的优点;提出一种提升分类准确率的过滤方法。实验结果表明,提出的网页分类方法具有优异的性能表现,使其具有良好的应用前景。 Web-page classification can be utilized to categorize massive web-pages and thus can be utilized in lots of areas.There are quite a few existing automatic web-page classification methods,among which there is large performance improvement space for the commonly-used web-content-based method,due to the impurity of page content.In this paper,based on query log,a novel web-page-classification method NQPC(Novel Query log-based web-Page Classification)is proposed.Its novelty is that: a low-dimensional feature vector extraction method is proposed to avoid the"curse of dimensionality";web-page classification is based on high-quality query log,which has purer content than web-page content;a filter method is proposed to improve the classification accuracy.Experimental results show that the web-page-classification method has excellent performance,which gives it good application prospects.
出处 《计算机工程与应用》 CSCD 2012年第11期82-87,128,共7页 Computer Engineering and Applications
基金 国家自然科学基金(No.60803085 No.60873245) 广东省中国科学院全面战略合作项目(No.2009A0091100002 No.2010A090100004) 东莞市重大科技专项(No.2009215102001)
关键词 查询日志 网页分类 机器学习 文本分类 特征抽取 query log web-page classification machine learning text classification feature extraction
  • 相关文献

参考文献22

  • 1CNNIC.第27次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.cn/research/bgxz/tjbg/201101/t20110120_20302.html/2011-01-19.
  • 2Chakrabarti S,Dom B E,Indyk P.Enhanced hypertext cate-gorization using hyperlinks[C]//Proc of the ACM SIGMODInt Conf on Management of Data.New York:ACM,1998:307-318.
  • 3Asirvatham A P,Ravi K K,Prakash A,et al.Web pageclassification based on document structure[EB/OL](.2001).[2011-03-05].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.7710&rep=rep1&type=pdf.
  • 4Shen D,Sun J T,Yang Q,et al.A comparison of implicitand explicit links for web page classification[C]//Procof the 15th Int Conf on World Wide Web.New York:ACM,2006:643-650.
  • 5Xue G R,Yu Y,Shen D,et al.Reinforcing web-objectcategorization through interrelationships[J].Data Miningand Knowledge Discovery,2006,12(2/3):229-248.
  • 6Cohen W W.Improving a page classifier with anchor ex-traction and link analysis[C]//Proc of Advances in NeuralInformation Processing Systems.Cambridge,MA:MITPress,2002,15:1481-1488.
  • 7Kan M Y,Thi H O N.Fast webpage classification usingURL features[C]//Proc of CIKM.New York:ACM,2005:325-326.
  • 8Qi X G,Davison B D.Web page classification:featuresand algorithms[J].ACM Comput Surv,2009,41(2):1-31.
  • 9Silverstein C,Marais H,Henzinger M,et al.Analysis of avery large web search engine query log[J].SIGIR Forum,1999,33(1):6-12.
  • 10Wen J R,Nie J Y,Zhang H.Clustering user queries ofa search engine[C]//Proc of the 10th Int World WideWeb Conf.New York:ACM,2001:162-168.

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部