期刊文献+

一种基于Agent和活动优先度的ETL过程并行方法

A parallel method for ETL process based on agent and activity priority
下载PDF
导出
摘要 ETL是数据仓库获得高质量数据的关键环节,在数据仓库的构建和实施中占有重要地位。针对传统ETL串行执行方式的不足,提出一种基于Agent和活动优先度相结合的ETL并行执行方法。该方法计算ETL执行过程中各个活动的优先度,利用Agent理论和多线程并行计算技术实现并行执行具有相同优先度且相互间没有依赖关系的ETL活动。实验结果表明,该方法在数据量较大时具有较好的加速比,提高了ETL过程的执行效率。 ETL is the essential step to obtain high-quality data for data warehouse, and plays an important role in the construction and implementation of data warehouse. Aiming at the deficiency of traditional serial ETL process, we propose a parallel method for ETL based on agent and activity priority. This method first calculates the priority of each ETL activity and then utilizes the agent theory and multi-thread computing techniques to achieve parallel execution of independent ETL activities with the same priority. Experimental results show that this method achieves high speedup when the data volume is large and improves the efficiency of ETL process.
作者 陈刚 杜鑫霖 曾司凤 安宝冉 CHEN Gang,DU Xin-lin ZENG Si-feng AN Bao-ran(Institute of Computer Application,China Academy of Engineering Physics,Mianyang 621900,China)
出处 《计算机工程与科学》 CSCD 北大核心 2017年第9期1594-1601,共8页 Computer Engineering & Science
基金 国家863计划(2007AA1236) 中国工程物理研究院发展基金(14-FZJJ-0442)
关键词 AGENT 活动优先度 ETL 并行 agent activity priority ETL parallel
  • 相关文献

参考文献1

二级参考文献51

  • 1Sean K, Jeffrey H, Catherine P, et al. Research directions in data wrangling: visualizations and transformations for usable and credible data [J]. Information Visualization, 2011, 10(4): 271-288. [DOI: 10.1177/147387161415994].
  • 2Bhattacharya I, Getoor L. Collective entity resolution in relational data [C]//Proceedings of ACM Transactions on Knowledge Discovery in Data. Berlin, Germany: Springer, 2007, 1(1). [DOI: 10.1145/1217299.1217 304].
  • 3Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: a survey [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16. [DOI: 10.1109/TKDE.2007.9].
  • 4Gravano L, Ipeirotis P G, Jagadish H V, et al. Using qgrams in a dbms for approximate string processing [J]. IEEE Data Engineering Bulletin, 2001, 24(4): 28-34. [DOI: 10.1.1.14.6009].
  • 5Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning [C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Alberta, Canada: ACM, 2002: 1-10. [DOI: 10.1145/775047. 775087].
  • 6Robertson G G, Czerwinski M P, Churchill J E. Visualization of mappings between schemas [C]//Proceedings of SIGCHI Conference on Human Factors in Computing Systems. Portland, Oregon, USA: ACM, 2005: 431-439. [DOI:10.1145/10549 72. 1055032].
  • 7Kang H, Getoor L, Shneiderman B, et al. Interactive entity resolution in relational data: a visual analytic tool and its evaluation [J]. IEEE Trans. Vis. Comput. Graph., 2008, 14(5): 999-1014. [DOI: 10.1109/TVCG.2008.55].
  • 8Huynh D, Mazzocchi S. Freebase GridWorks [CP]. http://code.google.com/p/google-refine/.
  • 9Raman V, Hellerstein J M. Potter's wheel: an interactive data cleaning system [C]//Proceedings of the 27th International Conference on Very Large Data Bases. Roma, Italy: Morgan Kaufmann, 2001: 381-390.
  • 10Li L, Peng T, Kennedy J. Improving data quality in data warehousing applications [C]//Proceedings of the 12th International Conference on Enterprise Information Systems. Funchal, Madeira, Portugal: SciTePress, 2010: 211-219.

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部