摘要
在计算机与网络技术支撑日趋成熟的情况下,企业内部充斥着大量的电子信息。为了能够满足企业对高效、准确地检索出所需讯息的需求。搜索引擎技术的革新、发展被提上日程,而人们在文本检索中使用的排序算法又是影响搜索引擎质量的一个不可忽略的因素。原始的Lucene搜索引擎使用基于向量模型的排序算法,然而这种原始的算法在自然语义理解上面有很大的弊端。由此论文在剖析Lucene组织结构、文档排序算法以及对比经典排序算法DirectHit、PageRank的基础之上,提出了一种新型的Vector-PageRank排序算法,算法针对基础算法的不足之处进行优化,并在此算法的基础之上设计实现了一款适合企业的搜索引擎系统。实验结果表明,经过优化后的Lucene的排序算法精确度更高,更符合用户的关切度。
With the support of computer and network technology becoming more and more mature,the enterprise is filled with a large amount of electronic information. In order to meet the needs of enterprises for efficient and accurate retrieval of the required information,the innovation and development of search engine technology has been put on the agenda,and the sorting algorithm used in text retrieval is an important factor that can not be ignored in the quality of search engines. The original Lucene search en. gine uses the sorting algorithm based on vector model,but this original algorithm has a lot of disadvantages in natural semantic un. derstanding. This paper analyzes Lucene structure,sorting algorithm and sorting algorithm,compared with the classic DirectHit PageRank foundation,a new Vector algorithm based on PageRank algorithm is proposed,the algorithm is optimized and shortcom. ings,based on the algorithm design and the implementation of a suitable enterprise search engine system. The experimental results show that the optimized Lucene sorting algorithm is more accurate and more consistent with the user's concerns.
作者
沙阳阳
吴陈
SHA Yangyang;WU Chen(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212000)
出处
《计算机与数字工程》
2019年第5期1208-1211,1239,共5页
Computer & Digital Engineering