期刊文献+

大数据处理框架SDC的拓展及应用

Development and application of big data processing framework SDC
下载PDF
导出
摘要 SDC(Streamsets Data Collector)是一款可拖拽式的大数据ETL工具,可以不用写代码就能实现大量数据的处理,但要实现任务定时管理和多数据源等复杂功能则需要利用其公司的非开源产品。文章介绍利用SDC内部接口,设计开发定时组件以及结合其自带组件,实现管道(PipeLine)的定时任务调度和多数据源应用。实验结果表明,拓展的框架组件可以实现伪实时及复杂的定时任务,并结合内部组件完成本机及远程多数据源整合应用,能解决用户对具体定时场景的应用问题。 SDC(Streamsets Data Collector) is a drag-and-drop big data ETL tool that can handle large amounts of data without writing code,but for complex functions such as task timing management and multiple data sources,it is needed to work with its company's non-open source products.In this paper,SDC internal interface is used to design and develop a timing component and with the combination of SDC internal components to realize timing task scheduling and multi-data source application of pipeline. The experimental results show that the extended framework components can realize pseudo-real-time and complex timing tasks,and combine the internal components to complete the integration application of local and remote multiple data sources,which solves the application problem of users to specific timing scenarios.
作者 吴广建 于梦洁 Wu Guangjian;Yu Mengjie(Hangzhou Normal College,Alibaba Business University,Hangzhou,Zhejiang 311100,China;College of education,Shanghai normal university)
出处 《计算机时代》 2019年第7期19-21,共3页 Computer Era
关键词 ETL工具 管道 定时组件 多数据源 big data ETL tool pipeline timing component multi-data source
  • 相关文献

参考文献3

二级参考文献94

共引文献1477

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部