摘要
随着电子商务的飞速发展,商品数量越来越多,商品筛选应用对海量商品信息进行分布式存储和处理时,现有模型需要分布式系统中各个节点全部并行工作,再将每个节点结果整合得到最终结果,其过程会产生很多无效查询。提出一种基于关键列预处理的分布式数据存储与查询技术。该技术通过对表的历史查询进行统计分析,选定高频或核心字段作为关键列,根据关键列和分布式系统节点结构来进行数据存储;当有数据查询请求时,对于含有关键列的查询,可只针对部分节点生成有效查询任务。结果证明,该技术在不带来额外存储开销的情况下,可减少分布式系统中各节点的总工作任务数,有效提高系统吞吐率。
With the rapid development of electronic commerce and the increasing quantity of the goods,all nodes of distribute system are needed to work for massive data storage and distribute processing in the existing model,but a lot of invalid query is produced in the process of integration of each results. This paper proposes a distributed data storage and query technology based on key column pre-processing. The statistic analysis is carried out on history search results and the high frequency or core columns is chosen to be key columns. The data can be stored based on key columns and distribute system architecture. When there is a data query request,for queries that contain key columns,valid query tasks are generated only for some nodes. The results show that this method can reduce the tasks and improve the throughout without extra storage consumption.
作者
张伟
马利民
智昊
ZHANG Wei;MA Limin;ZHI Hao(Computer School,Beijing Information Science & Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2018年第4期1-9,共9页
Journal of Beijing Information Science and Technology University
基金
北京市未来芯片技术高精尖创新中心科研基金资助项目(KYJJ2016005)
北京市青年拔尖人才培育项目(CIT&TCD201504057)
关键词
关键列
预处理
海量结构化数据
分布式数据存储
key column
pre-processing
massive structured data
distributed data storage