摘要
[目的/意义]实现学术查询意图的自动识别,提高学术搜索引擎的效率。[方法/过程]结合已有查询意图特征和学术搜索特点,从基本信息、特定关键词、实体和出现频率4个层面对查询表达式进行特征构造,运用Naive Bayes、Logistic回归、SVM、Random Forest四种分类算法进行查询意图自动识别的预实验,计算不同方法的准确率、召回率和F值。提出了一种将Logistic回归算法所预测的识别结果扩展到大规模数据集、提取"关键词类"特征的方法构建学术查询意图识别的深度学习两层分类器。[结果/结论]两层分类器的宏平均F1值为0.651,优于其他算法,能够有效平衡不同学术查询意图的类别准确率与召回率效果。两层分类器在学术探索类的效果最好,F1值为0.783。
[Purpose/significance]To find the solutions of automatically identifying search query intent and improve the efficiency of academic search engines.[Method/process]Combining the features of query intent and academic search,we constructed the feature from four aspects,which are the basic descriptive statistics,the special keywords,entity information and the frequency.For the experiments,we examined four types of classifiers which are the Naive Bayes,Logistic regression,SVM,Random Forest and calculated precision,recall and F-measure.A method which is extending the recognition results of academic query intent predicted by Logistic regression algorithm to large-scale data sets and extracting"keyword type"features is proposed to construct a two-layer classifier based on deep learning algorithm for academic query intent recognition.[Result/conclusion]The macro-average F1 value of the two-layer classifier is 0.651,which is superior to other algorithms.This method can effectively balance the precision and recall rate of different academic query intentions.The final second-layer prediction model receives the best classification performance,the score of F1 is 0.783.
作者
王瑞雪
方婧
桂思思
陆伟
张显
Wang Ruixue;Fang Jing;Gui Sisi;Lu Wei;Zhang Xian(School of Information Management,Wuhan University,Wuhan 430072;College of Information Science&Technology,Nanjing Agricultural University,Nanjing 210095;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072;Baidu Times Network Technology(Beijing)Co.,Ltd.Beijing 100085)
出处
《图书情报工作》
CSSCI
北大核心
2021年第3期93-99,共7页
Library and Information Service
基金
国家社会科学基金青年项目"面向学术搜索的查询意图研究"(项目编号:19CTQ023)研究成果之一。
关键词
学术查询意图
自动识别
两层分类器
academic query intent
automatic identification
two-layer classification