Learning Query Ambiguity Models by Using Search Logs 被引量：1

Learning Query Ambiguity Models by Using Search Logs

导出

摘要 Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine （SVM）, to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%. Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine （SVM）, to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.

作者宋睿华窦志成洪小文俞勇

机构地区 Department of Computer Science Microsoft Research Asia

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期728-738,共11页 计算机科学技术学报（英文版）

关键词 ambiguous query log mining query classification ambiguous query, log mining, query classification

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Song R, Luo Z, Nie J Y, Yu Y, Hon H W. Identification of ambiguous queries in Web search. Information Processing and Management, 2008, 45(2): 216-229.
2Dou Z, Song R, Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th International Conference on World Wide Web (WWW2007), Banff, Canada, May 8-12, 2007, pp.581-590.
3Sanderson M. Ambiguous queries: Test collections need more sense. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2008), Singapore, July 20-24, 2008, pp.499- 506.
4Radlinski F, Dumais S. Improving personalized Web search using result divcrsification. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), Seattle, USA, Aug. 6-11, 2006, pp.691-692.
5Li Y, Zheng Z, Dai K. KDD CUP-2005 report: Facing a great challenge. SIGKDD Explor. Newsl., 2005, 7(2): pp.91-99.
6Vapnik V N. Principles of Risk Minimization for Learning Theory. Advances in Neural Information Processing Systems 4, Morgan Kaufmann, 1992, pp.831-838.
7Mihalcea R, Pedersen T. Advances in word sense disambiguation. In Tutorials at the 20th National Conference on Artificial Intelligence, Pittsburgh, USA, July 9-13, 2005.
8Krovetz R, Croft B W. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst. 1992, 10(2): 115-141.
9Voorhees E M. Using WordNet to disambiguate word senses for text retrieval. In Proc. the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), Pittsburgh, USA, June 27-July 1, 1993, pp.171-180.
10Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, Aug. 24- 28, 1998, pp.335-336.

同被引文献5

1陆伟,周红霞,张晓娟.查询意图研究综述[J].中国图书馆学报,2013,39(1):100-111. 被引量：27
2唐祥彬,陆伟,张晓娟,黄诗豪.查询专指度特征分析与自动识别[J].现代图书情报技术,2015(2):15-23. 被引量：5
3桂思思,张晓娟,王鑫.查询歧义性程度自动标注指标的替代性验证研究[J].数据分析与知识发现,2019,3(2):79-89. 被引量：3
4王瑞雪,方婧,桂思思,陆伟,张显.基于深度学习算法的学术查询意图分类器构建[J].图书情报工作,2021,65(3):93-99. 被引量：9
5桂思思,陆伟,张晓娟.基于查询表达式特征的时态意图识别研究[J].数据分析与知识发现,2019,3(3):66-75. 被引量：11

引证文献1

1桂思思,徐健.面向三类查询意图歧义性的查询表达式自动识别研究[J].情报科学,2021,39(11):90-95. 被引量：3

二级引证文献3

1桂思思,张晓娟.面向查询意图歧义性的多样化检索模型研究[J].情报科学,2021,39(12):39-45. 被引量：2
2曾聪.基于分类查询的不动产测绘一体化管理系统[J].西部资源,2024(1):110-112.
3张勇飞,陈艳君,赵世忠.引入神经网络极限学习机的关键数据查询模型[J].计算机仿真,2024,41(3):519-523.

1张国栋,张化祥.基于语义的文本特征加权分类算法[J].计算机应用研究,2012,29(12):4476-4478. 被引量：5
2杜丽娟,邵杰.基于LCS的机器人路径规划收敛性[J].四川兵工学报,2010,31(4):99-101. 被引量：3
3邵杰,杨静宇.基于多LCS和人工势场法的机器人行为控制[J].计算机科学,2011,38(1):264-267. 被引量：2
4快意江湖.Windows目录下几大需清理的地方[J].网络与信息,2008(3):55-55.
5王金凤,王熙照.敏感属性与不敏感属性对决策树的影响[J].计算机工程与应用,2003,39(26):78-80. 被引量：1
6邵杰,杨静宇,石朝侠.基于学习分类器的自主地面车在狭隘环境中的路径规划[J].信息与控制,2011,40(3):413-417.
7王士同.模糊询问的查询策略及其精确回答[J].小型微型计算机系统,1989,10(8):35-41. 被引量：1
8邵杰,杜丽娟,杨静宇.XCSG在多机器人强化学习中的应用[J].计算机科学,2013,40(8):249-251. 被引量：2
9知识库[J].电气试验,2012(6):58-58.
10知识库[J].电气试验,2012(4):60-60.

Journal of Computer Science & Technology

2010年第4期

浏览历史

内容加载中请稍等...

Learning Query Ambiguity Models by Using Search Logs 被引量：1

参考文献21

同被引文献5

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史