摘要
随着大数据技术的发展,采用流批一体化实现多源异构数据的融合逐渐成为必然趋势。在处理非结构化数据时,数据仓库在实时性方面存在延迟等问题。而数据湖具备存储多源异构数据的能力,能够处理大规模数据集和实时数据,具有存储成本低及数据灵活等优势。文章通过流批一体对多源异构数据进行融合,并围绕数据采集、存储、计算等构建湖仓一体架构。该架构具备强大的数据分析和挖掘能力,可从海量数据中挖掘有价值的数据,以支持决策和创新。
With the development of big data technology,adopting the integration of streaming and batch to achieve the fusion of multi-source heterogeneous data has gradually become an inevitable trend.When dealing with unstructured data,data warehouses suffer from delays in real-time performance and other issues.The data lake has the ability to store heterogeneous data from multiple sources,handle large-scale datasets and real-time data,and has advantages such as low storage costs and flexible data.This article integrates multi-source heterogeneous data through flow batch integration,and constructs a lake warehouse integrated architecture around data collection,storage,and calculation.This architecture has powerful data analysis and mining capabilities,which can mine valuable data from massive amounts of data to support decision-making and innovation.
作者
刘胜
LIU Sheng(Guangxi Institute of Artificial Intelligence and Big Data Application Co.,Ltd.,Nanning 530000,China)
出处
《计算机应用文摘》
2024年第2期42-44,共3页
Chinese Journal of Computer Application
关键词
流批一体
实时性
数据分析
stream batch integration
real-time
data analysis