摘要
为解决使用传统集中式检索处理海量异构科技信息资源时存在单点故障、性能低、不易扩展等问题,提出一种在异构科技资源下应用的分布式高性能检索系统(DHRS),并对其核心技术进行重点研究和分析。针对检索结果资源访问开销大的问题,给出基于访问代价的评估算法。并结合实际应用场景对算法进行优化,优化后请求数减少了80%,实验环境下的性能平均提高了68%。同时通过真实数据集的测试,验证了DHRS检索海量科技资源的可行性,能够适用于对检索和扩展性能要求较高的场景。
When using the traditional centralized retrieval method to deal with massive heterogeneous technology information resources,there are many problems such as single point of failure,poor performance and extensibility. To solve this problem,a distributed high-performance retrieval system( DHRS) applied to heterogeneous technology resources is proposed. First,key techniques of the DHRS were studied and analyzed. Aiming at the problem of large access cost of retrieval results,an evaluation algorithm based on access cost was proposed. Secondly,the algorithm was optimized according to the practical application scenario. The number of requests after optimization was reduced by80%,and the performance in the experimental environment was improved by 68%. Finally,the test of real data sets proves the feasibility of DHRS retrieval of large amount of scientific and technological resources. It can be applied to search and extend performance requirements of the scene.
出处
《计算机应用与软件》
2017年第10期78-84,156,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61462053)
中国博士后科学基金项目(2016M602730)
关键词
科技资源
分布式检索
海量数据
ElasticSearch
异构资源
Scientific and technological resources Distributed retrieval Massive data ElasticSearch Heterogene-ous resources