摘要
以HDFS、Hive等Hadoop项目为核心的大数据技术在构建企业级数据中心方面是业界的通用标准之一。由于技术方面的限制,其无法高效地在大型数据集上同时满足数据的更新、删除等数据管理操作。随着Apache Hudi技术的出现,使原有方案能在Hadoop兼容的存储系统上更新大量的数据,并针对不同读写场景提供不同的视图。以卷烟研发体系大数据分析平台的构建与应用为例,提出一种基于Apache Hudi技术的支持批处理和流处理的企业级大数据平台的云整合解决方案。通过该方案能够在大数据应用中实现数据流、批等并行处理和接入,同时能基于云优势提升整体服务的可靠性和扩展性。
The big data technology based on Hadoop projects such as HDFS and Hive is one of the industry’s common standards for building enterprise data centers.Due to technical limitations,it cannot efficiently meet data management operations such as data update and deletion on large data sets at the same time.With the emergence of Apache Hudi technology,the original scheme can update a large amount of data on Hadoop-compatible storage systems and provide different views for different read and write scenarios.Taking the construction and application of big data analysis platform of cigarette research and development system as an example,this paper proposes a cloud integration solution of enterprise big data platform based on Apache Hudi technology,which supports batch processing and stream processing.Through this scheme,parallel processing and access of data flow and batch can be realized in big data applications,and the reliability and scalability of the overall service can be improved based on cloud advantages.
作者
贺莉苹
刘奇燕
张海涛
胡许冰
陈翔
曾仲大
HE Li-ping;LIU Qi-yan;ZHANG Hai-tao;HU Xu-bing;CHEN Xiang;ZENG Zhong-da(China Tobacco Yunnan Industrial Co.,Ltd.,Kunming 650231,China;Shangyuan Smart Technology Co.Ltd.,Floor 15,IT International Main Building of Angli Software Park,Hunnan District,Shenyang 110167,China;Dalian Chem Data Solution Information Technology Co.Ltd.,Dalian 116023,China)
出处
《湖北农业科学》
2022年第S01期356-359,共4页
Hubei Agricultural Sciences
基金
云南中烟工业有限责任公司科技项目(2019XX03)