摘要
信息检索效率在很大程度上取决于用户看到的搜索引擎结果页面所提供的内容.目前,红色突显查询词是商业搜索引擎结果页面主要采用的文本展示方式,但由于查询词可能表达模糊或者包含噪音,与用户的查询意图往往不能完全一致.为了能够充分地反映用户的查询意图,同时突显对于满足用户查询意图最重要的词语,基于人工标注的结果提出一种新的关键词突显策略;综合结构化支持向量机、隐马尔科夫、最大间隔马尔科夫网络及条件随机场4种基础的序列标注机器学习模型,进一步提出一种新的联合序列学习模型并进行用户搜索实验.实验结果表明:该种模型优于4种基础模型,与人工标注的结果相比取得了93.30%的准确率;所提出的关键词突显策略明显优于传统的查询词突显策略,提高了用户的满意度及搜索效益.
The efficiency of information retrieval from web depends largely on the search engine results page(SERP)that obtained by searchers,especally the highlighting text.At present,the SERP of commercial search engines usually uses query terms hi ghlighting strategy.However,the query words can be ambiguous and even contain noise,which may be incompletely consistent with the search intention of users.In order to highlight the most important terms that describe the search information clearly,this paper proposes a new key term highlighting strategy based on the results of manual annot ation.Then this paper generates highlighting terms based on four machine learning algorithms,including str uctured support vector machine,hidden Markov model,max margin Markov networks and conditional random field algorithm.In addition,this paper also proposes a new method which called the joint sequence la beling(JSL)algorithm to combine these four structured lear ning algorithms.Moreover,this paper conducts search experiments by using JSL algorithm.Experimental results show that the JSL algorithm provides more accurate solutions compared with the baselines and its search accuracy achieves 9330%.And the results of search experiments show that the key term highlighting strategy achieves better performance and users'satisfactory than tradit ional query terms hi ghlighting strategy.
作者
张辉
马少平
ZHANG Hui;MA Shaoping(De partment of Computer Science and Technology/State Key Laboratory of Intelligent Technology and Systems,Tsinghua University,Beiing 100084,China)
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2020年第2期117-125,共9页
Journal of Shanghai Jiaotong University
基金
国家自然科学基金(61622208,61732008,61532011)资助项目
关键词
搜索引擎结果页面
用户意图
查询词突显
序列标注算法
search engine results page(SERP)
intent of user
query terms highlighting
joint sequence