摘要
【目的/意义】针对查询意图歧义性自动识别,探讨特征有效性及采用不同分类算法识别三类查询意图歧义性的分类准确率,以期对后续研究提供借鉴与指导。【方法/过程】首先提出了一个面向查询意图歧义性的查询表达式分类体系;随后,构建了查询表达式特征及相关文档特征共六类;最后,分别采用决策树算法、神经网络算法及k最邻近算法,探讨采用不同特征组合的有效性及不同分类算法的分类准确率。【结果/结论】(1)分类准确率较基准实验提升比例为49.5%;(2)使用查询表达式特征分类优于使用相关文档特征;(3)决策树的分类准确率略高于其他两类分类算法。【创新/局限】构建了一个面向查询意图歧义性的查询分类体系;完成了面向三类查询意图歧义性的分类任务;然限于数据集获取途径,仅对200数据验证。
【Purpose/significance】This paper investigates the effectiveness of classification features and compares the performance of three classifiers in a query ambiguity intent classification task.【Method/process】This paper first constructs a query taxonomy of ambiguity and then extracts query-based features and document-based features.Later,it tests accuracy,using decision tree,neural network,k-nearest neighbor individually,with various combinations of features.【Result/conclusion】(1)An accuracy is increased by 49.5%compared with the baseline;(2)Compared with document-based features,using query-based features achieves better accuracy;(3)Decision tree performs best among the tested classifiers.【Innovation/limitation】A query taxonomy of ambiguity is constructed;A query classification task based on three types of ambiguity is realized;Due to dataset accessibility,our experiments are done on a limited size dataset.
作者
桂思思
徐健
GUI Si-si;XU Jian(Nanjing Agricultural University,Nanjing 210095,China)
出处
《情报科学》
CSSCI
北大核心
2021年第11期90-95,共6页
Information Science
基金
国家社会科学基金青年项目“面向学术搜索的查询意图研究”(19CTQ023)。
关键词
查询意图
歧义性
自动分类
特征构建
效果测评
query intent
ambiguity intent
query classification
feature-engineering
evaluation