摘要
随着信息技术的飞速发展,各级政府和大型企业掌握的数据量正在以指数级别增长。然而,数据来源多样会导致格式差异,数据质量参差不齐会影响应用效果,数据分散管理会弱化关联汇集,数据形态异构会造成语义鸿沟。在此背景下,多源异构数据融合负责将来源不同的多模态数据进行有效整合,完成数据互补与关联,进而实现信息增强。目前,大多数已有研究的关注重点集中在大数据治理流程与多模态深度学习,很少有工作研究讨论完整的多源异构数据融合技术框架。因此,在综述关键技术的基础上,文中提出了一整套涵盖“数据引接-数据清洗-数据集成-数据融合”全过程的多源异构数据融合关键技术框架,并对各个环节需要解决的问题与重点任务进行介绍。然后,通过一个政务应用实例场景,给出了政务大数据治理体系的设计,以解决政务数据来源广泛、质量参差不齐、管理分散、形态异构的问题,并进一步阐述了多源异构数据融合的重要价值。最后总结全文并展望未来。
With the rapid development of information technology,the data held by governments and enterprises are growing exponentially.However,the multi-source of data will lead to different formats,the low quality of data will affect the application results,the decentralized management of data will weaken integration services,and the heterogeneous modal of data will cause semantic gaps.Under this background,multi-source heterogeneous data fusion is responsible for effectively integrating multi-modal data from different sources,and then achieve information complementarity and data association,thus realizing information enhancement.At present,most studies focus on big data governance process and multi-modal deep learning,there are few works discuss integral multi-source heterogeneous data fusion framework.Therefore,based on reviewing the key technologies,this paper proposes the key technologies framework of multi-source heterogeneous data fusion that covering the processes of“data collection-data cleaning-data integration-data fusion”,and introduces the problems and tasks of each stage.Then,through an example of the government affairs application,the data governance system for government data is designed,which further explains the signi-ficance of multi-source heterogeneous data fusion.In the end,this paper is summarized and future work is prospected.
作者
闫佳和
李红辉
马英
刘真
张大林
江周娴
段宇航
YAN Jiahe;LI Honghui;MA Ying;LIU Zhen;ZHANG Dalin;JIANG Zhouxian;DUAN Yuhang(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;National Information Center,Beijing 100045,China;School of Software,Beijing Jiaotong University,Beijing 100044,China)
出处
《计算机科学》
CSCD
北大核心
2024年第2期1-14,共14页
Computer Science
基金
国家重点研发计划(2019YFB2102500)。
关键词
多源异构数据
多模态数据融合
数据治理技术
政务大数据
大数据治理流程
Multi-source heterogeneous data
Multi-modal data fusion
Data governance technology
Big data of government affairs
Big data governance process