摘要
当今已经进入了云计算时代,其中,数据的分布式存储和查询也已成为大数据管理的关键技术之一.分布式数据管理在数据存储层面需要进行数据的分片和配置,在查询方面需要进行各节点查询子结果的配置与整合.对于结构化数据例如关系型数据已经具有了成熟有效的相关技术,但对于主要是半结构化、无结构化以及多种数据模式混杂的大数据而言,其分布式数据分片和配置还是一个具有需要深入研究的课题.本文针对具有广泛应用的具有时间标签的非结构化数据,提出了一种按照时间标签进行分片与配置的分布式存储与处理方案,相应的仿真实验表明本文的工作是可行的和有效的.
Distributed data storage and query became one of key technologies in management of big data for the new era of cloud computing.Distributed data management requires partition and allocation of data at the data storage level,and it is necessary to configure and integrate the query results of each site in query aspect.For structured data,for example,relational data already has mature and effective related technologies.But for big data that is largely semi-structured,unstructured,and a variety of data patterns are mixed,distributed data partition and allocation is still a topic that needs to be studied deeply.This paper proposes a distributed storage and processing scheme based on time label partition and allocation for unstructured data with time labels is widely used.The corresponding simulation experiments show that the work in this paper is feasible and effective.
作者
罗先录
叶小平
王千秋
李强
LUO Xian-lu;YE Xiao-ping;WANG Qian-qiu;LI Qiang(Department of Computer Science and Technology,Neusoft Institute of Guangdong,Foshan 528225,China;School of Computer,South China Normal University,Guangzhou 510631,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第11期2497-2502,共6页
Journal of Chinese Computer Systems
基金
广东省协同创新与平台环境建设项目(2017A040406001)资助
广东省佛山市科技创新项目(2016AG100792)资助