期刊文献+

基于DW2.0的烟草海量数据分析系统的设计与实现 被引量:10

Design and implementation of DW2.0-based massive tobacco data analysis system
下载PDF
导出
摘要 为解决传统数据仓库系统在数据存储、数据处理、数据展现能力等方面存在的问题,结合江苏中烟的实际应用情况,设计了烟草海量数据分析系统。借鉴DW2.0理论并结合大数据应用技术,引入分布式处理架构,采用传统数据仓库与Hadoop融合的方式,形成集成、协同的数据仓库架构;引入数据生命周期管理方法,提高系统的快速响应能力;利用Hadoop HBase处理非结构化数据,以Hadoop Map Reduce的并行计算框架作为通信层,调度和协调集群中各节点的计算和通信。测试结果表明,数据量在1亿条、10亿条时,新系统的响应时间比传统方式分别提升30%、80%,有效提高了数据查询效率,提升了企业数据仓库系统的应用水平。 To solve the problems associated with traditional data warehouse system in data storage, processing and presentation, a massive tobacco data analysis system was designed by taking the practical application of China Tobacco Jiangsu Industrial Limited Corporation into account. An integrated and coordinated data warehouse was configured via referring to DW2.0 theory and big data application technology, introducing distributive processing architecture and fusing traditional data warehouse with Hadoop. The system's high response ability was achieved by data lifecycle management. Unstructured data were processed by Hadoop HBase, and the parallel computing framework of Hadoop Map Reduce was used as the communication layer to schedule and coordinate the computing and communication at nodes in clusters. The test results indicated that comparing with traditional methods, the response time of the new system was promoted by 30% and 80% when the magnitude of data reached 100 million and 1 billion, respectively. It effectively improved the application level of data warehouse system.
出处 《烟草科技》 EI CAS CSCD 北大核心 2016年第4期96-102,共7页 Tobacco Science & Technology
关键词 烟草 海量数据 数据仓库 DW2.0理论 Hadoop架构 数据生命周期 Tobacco Massive data Data warehouse DW2.0 theory Hadoop architecture Data lifecycle
  • 相关文献

参考文献15

二级参考文献172

共引文献1173

同被引文献54

引证文献10

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部