摘要
文档检索结果的排序和文本分类技术是解决垂直搜索、个性化信息检索、信息过滤等相关问题的核心技术。为了提高检索系统的性能,针对Lucene的基础排序算法,提出了一种融合位置相关和概率排序的改进方法。考虑到查询词在文档中出现的位置信息和概率排序对文档相关性的影响,利用位置相关的查询词权值和基于朴素贝叶斯分类算法的文档相关性概率值,对Lucene基础排序算法的评分公式进行改进。实验表明,该改进方法能够有效提高垂直搜索的准确率,使用户拥有更好的垂直搜索体验。
Sorting document retrieval results and text classification technology is the core technology to solve vertical search, personalized information retrieval, information filtering and other related issues. In order to improve the performan- ce of retrieval systems, an improved method for integrating location-related and probabilistic sorting was proposed for Lucene default sorting algorithm. Taking into account the document relevance impact of query's location information and probabilistic sorting, the scoring formula of Lucene default sorting algorithm is improved using the probability value of document relevance based on naive Bayesian classification algorithm and the weights of location-related query. Experi- mental results show that this improvement can effectively improve the accuracy of vertical search, allowing users to have better vertical search experience.
出处
《计算机科学》
CSCD
北大核心
2016年第9期247-249,273,共4页
Computer Science
基金
计算机软件与理论北京市重点学科基金(007000541215042)资助
关键词
位置相关
概率排序
LUCENE
排序算法
垂直搜索
Location-related, Probabilistic sorting, Lucene, Sorting algorithm, Vertical search