摘要
针对目前搜索引擎引擎系统存在的数据量庞大、访问用户高并发性和搜索延迟性的特点,提出了基于云存储的文档索引分类存储模型,并在索引数据分类存储算法实现过程中,采用基于Map/Reduce编程模型的二次索引词权重计算,以降低分类过程中的模糊粒度.通过实验验证基于该存储模型的算法不仅可以提高海量数据索引库的数据处理效率,而且在一定程度上降低了检索系统查询延迟,提高了搜索效率.
The main problems of current search engine system applied on intelligent terminals are limited storage capacity with massive data,high-concurrency access of users and search delay of system.Aiming to tackling these problems,this paper proposes a cloud storage model of index classification and adopts a new index storage algorithm based on Map/Reduce programming model.The algorithm calculates the secondary weight of index term in the process of index classification in order to lower the fuzzy granularity of the classification.Based on the experimental results,the proposed storage model can not only improve the mass data processing efficiency,but also to some extent reduce query delay and ameliorate the search efficiency.
出处
《宁波大学学报(理工版)》
CAS
2011年第3期29-33,共5页
Journal of Ningbo University:Natural Science and Engineering Edition
基金
国家科技重大专项(2011ZX0302-004-02)
国家重大专项核高基项目(2009ZX01039-001-002-004)
科技部公共服务平台基金(9C26243314159)
浙江省科技厅项目(2009C31107)
宁波大学科研基金(B00241104900)