摘要
数据集成,是大数据或数据仓库中很重要的一个操作环节,它直接关系到数据的使用是否顺利、便捷、准确。在企业级大数据或数据仓库中,数据集成是一个持续性的工作,经常会不断有新数据源加入并被要求集成到已有的大数据仓库表中。这个环节是一个非常耗时耗力的过程,而且一旦集成方案不恰当,就会导致一些更加耗时耗力的处理工作,例如需要大批量处理已有数据,以及需要大量修改已有调用逻辑,即数据割接、应用割接。文章旨在研究一种较为系统的、通用的模式,尽量避免以上情况的发生,从而让数据集成工作变得简单、可复用、可持续,也为数据使用提供稳定性、准确性。
Data integration is an important operation link in big data or data warehouse, which is directly related towhether the data is used smoothly, conveniently and accurately. In enterprise-level big data or data warehousing, dataintegration is a continuous effort, and the new data sources are constantly being added and required to be integrated intothe existing big data warehouse tables. This link is a very time-consuming and labor-intensive process, and once theintegration scheme is not appropriate, it will lead to some more time-consuming and labor-intensive processing, such asthe need to process existing data in large quantities, and the need to modify a large number of existing calling logic. Thatis data cutting and application cutting. This paper hopes to find a more systematic and versatile model to avoid thesesituations, so that data integration is simple, reusable, sustainable, and provides stability and accuracy for data usage.
作者
景超
田凌涛
Jing Chao;Tian Lingtao(ENC Digital Technology Co.,Ltd.,Nanjing 210005,China;Nanjing Jing Wei Patent&Trade Mark Agency Co.,Ltd.,Nanjing 210005,China)
出处
《江苏科技信息》
2019年第27期56-58,共3页
Jiangsu Science and Technology Information
关键词
数据集成
大数据
数据仓库
数据处理
数据融合
big integration
big data
data warehouse
data processing
data fusion