期刊文献+

Exploring features for automatic identification of news queries through query logs

Exploring features for automatic identification of news queries through query logs
下载PDF
导出
摘要 Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter. Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
出处 《Chinese Journal of Library and Information Science》 2014年第4期31-45,共15页 中国文献情报(英文版)
基金 supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
关键词 Query intent News query News intent Query classification Automaticidentification Query intent News query News intent Query classification Automaticidentification
  • 相关文献

参考文献15

  • 1伍大勇,赵世奇,刘挺,张宇.融合多类特征的Web查询意图识别[J].模式识别与人工智能,2012,25(3):500-505. 被引量:11
  • 2Maristella Agosti,Franco Crivellari,Giorgio Di Nunzio.Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction[J]. Data Mining and Knowledge Discovery . 2012 (3)
  • 3Mauro Rojas Herrera,Edleno Silva de Moura,Marco Cristo,Thomaz Philippe Silva,Altigran Soares da Silva.Exploring features for the automatic identification of user goals in web search[J]. Information Processing and Management . 2009 (2)
  • 4Bernard J. Jansen,Danielle L. Booth,Amanda Spink.Determining the informational, navigational, and transactional intent of Web queries[J]. Information Processing and Management . 2007 (3)
  • 5Andrei Broder.A taxonomy of web search[J]. ACM SIGIR Forum . 2002 (2)
  • 6Hassan A,Jones R,Diaz F.A case study of using geographic cues to predict query news intent. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . 2009
  • 7Jacob Cohen.A Coefficient ofAgreement for Nominal Scales. EducationalandPsychologicalMeasurement . 1960
  • 8He D Q,Goker A.Detecting Session Boundaries From Web User Logs. Proceedings of the 22nd Annual Colloquium on Information . 2000
  • 9Maslov M,Golovko A,Segalovich I,Braslavski P.Extracting news-related queries from web query log. Proceedings of the 15th international conference on World Wide Web . 2006
  • 10ZHAO Qiankun,LIU Tieyan,BHOWMICK S S,et al.Eventdetection from evolution of click-through data. Proceed-ings of the 12th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining . 2006

二级参考文献2

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部