摘要
在大数据处理中,针对大量的结构化数据、半结构化数据,数据以不同形式被迁移、转换、装载,整个流程的数据和元数据都得不到很好掌控和集中管理,没有办法追根溯源,这对整个大数据平台自适应的推送能力和扩展能力产生极大影响。提出一种基于Neo4j图形数据库来对大数据的元数据进行溯源的方法,以使得整个大数据处理过程中对元数据进行全局掌控,流程监控和流程回溯。
In the process of big data, as for the large number of structured data, semi-structured data, the data is migrated, transformed,and loaded in different forms. The whole process of the data and metadata are hard to control and centralize management. And there is also no way to track back these data, so it affects this push capability and scalability of the whole large data platform. Proposes a method which is based on Neo4 j graphics databases to provenance to the metadata, in order to global control, flow monitoring and flow provenance the processing of the data.