摘要
结果去重和排序是提高元搜索引擎结果质量的两个关键问题,文章分析余弦相似度、基于TF-IDF的文本相似度三种去重算法,利用URL、标题和计算摘要相似度三方面去重;研究了Board排序、星星排序、轮询法、位置排序和概念可行度对检索结果的影响,提出了一种综合排序算法。实验结果表明,综合排序算法在准确率、召回率等方面都优于其他算法。
Go heavy and sorting are two key issues to improve the quality of the results of the meta-search engine,the article analyzes the cosine similarity,three kinds of text similarity based on TF-IDF weight algorithm,using the URL,title,and calculation of summary similarity toweight;Board sort stars sort,the polling method,location,sort,and the concept of feasible search results,a comprehensive sorting algorithm.The experimental results show that the integrated sorting algorithm accuracy,recall rate of better than other algorithms.
出处
《软件》
2012年第6期51-53,共3页
Software
关键词
元搜索
相似度
去重
排序
Meta-search
Similarity
to heavy
Sort