期刊文献+

基于“VASE”特征词的网络查询分类研究 被引量:3

Web Query Classification Based on "VASE" Characterizing Words
下载PDF
导出
摘要 网络查询分类对提高搜索引擎的搜索质量有重要的意义。该文通过对真实用户查询日志的分析和标注,发现四种特征词(称之为"VASE"特征词)对查询分类起决定性作用。我们提取特征词并构造了一个特征词倒排索引,用于对查询进行主题分类。在此基础之上,提出了基于网络扩展和加权特征词的方法改善分类的效果。实验结果显示,基于此分类方法的正确率和召回率分别达到78.2%和77.3%。 Web query classification is of great significance in improving the performance of search engine. By analyzing and manually labeling real user query logs, we found that four kinds of words, as called "VASE" characterizing words, substantially characterizing the query category. We extracted such words and made an inverted index from them for the web queriy classification. We further propose a corresponding web extension and weighted characteristic words methods to improve the classification results. Experimental results show that the precision rate and recall rate reach 78.2 % and 77.3 % respectively, meeting the practical requirements.
出处 《中文信息学报》 CSCD 北大核心 2009年第3期39-44,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60773027,60736044) 国家863计划重点资助项目(2006AA010108,2008AA01Z145)
关键词 计算机应用 中文信息处理 网络查询分类 “VASE”特征词 网络扩展 加权特征词 computer application Chinese information processing Web query classification "VASE" characteristic words Web extension weighted words
  • 相关文献

参考文献9

  • 1Andrei Broder. A taxonomy of web search [C]//ACM SIGIRForum. 2002, 3-10.
  • 2Daniel E. Rose, Danny Levinson. Understanding user goals in web search [C]//Proceedings of the 13th international conference on World Wide Web. 2004, 13-19.
  • 3Uichin Lee, Zhenyu Liu, Junghoo Cho. Automatic i-dentification of user goals in Web search [C]//Proceedings of the 14th international conference on World Wide Web. 2005, 391-400.
  • 4In-Ho Kang, GilChang Kim. Query type classification for web document retrieval [C]//Proeeedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 2003, 64-71.
  • 5Bernard J. Jansen, Danielle L. Booth, Amanda Spink. Determining the user intent of web search engine queries [C]//Proceedings of the 16th international conference on World Wide Web. 2007, 1149-1150.
  • 6KDDCUP2005, http://www, sigkdd, org/kdd2005/ kddcup, html [DB/OL].
  • 7Dou Shen, Rong Pan, Jian-Tao Sun, etc. Query enrichment for web-query classification [J]. ACM Transactions on Information Systems (TOIS) Volume 24, Issue 3. 2006, 320-352.
  • 8Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, etc. Improving Automatic Query Classification via Semi- Supervised Learning [C]//Proceedings of the Fifth IEEE International Conference on Data Mining. 2005, 42-49.
  • 9Ricardo Baeza-Yates, Liliana Calderon-Benavides, Cristina Gonzalez-Caro. The Intention Behind Web Queries [J]. Lecture Notes in Computer Science, 2006, Volume 4209/2006: 98-109.

同被引文献35

  • 1余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 2D Shen,R Pan,J Sun,et al.Query Enrichment for Web-query Classifi-cation [J].ACM Transactions on Information Systems(TOIS),2006,24(3):320-352.
  • 3Haveliwala T H.Topic-sensitive pagerank [C]//www.02,NewYork,ACM,2002:517-526.
  • 4Seco N,Veale T,Hayes J.An Intrinsic Information Content Metric forSemantic Similarity in WordNet[C]//ECAI’2004.
  • 5KDD CUP 2005 [OL].2011-10-12.http://www.sigkdd.org/kdd2005/kddcup/KDDCUPData.zip.
  • 62011-04-13.http://dumps.wikimedia.org/zhwiki/20110322/.
  • 7Hu J,Wang G,Fred L,et al.Understanding user's query intent withwikipedia [C] //WWW 2009.2009:471-480.
  • 82012-03-25.http;//www.csie.ntu.edu.tw/-cjlin/libsvm/libsvm-3.12.zip.
  • 9Dou Shen, J. T. Sun, Qiang Yang, etal. Building bridg- es for web query classification[C]. Proceedings of the 29th annual international ACMSIGIR conference on research and development in information retrieval, 2006:131-138.
  • 10J. Fu, J. question Xu, K. Jia. Domain ontology based automatic answering[C]. International Conference on Computer Engineering and Technology, 2008 346-349.

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部