期刊文献+

烟草商业大数据中数据倾斜问题的探究

Research on Data Skew in Tobacco Business Big Data
下载PDF
导出
摘要 上海烟草商业由于建设了基于“互联网+”面向消费者现代营销体系,业务复杂度不断提升,业务数据量呈现爆炸式增长。虽然应用分布式多节点并行处理技术可以大幅提高计算效率,但是这也导致分布式计算系统中的数据倾斜问题成为大数据平台绕不开的难题。为此,文章提出通过数据预处理、提升分布式并行度、聚合及关联场景的算法处理等多种方法来解决数据倾斜问题。 Due to the construction of a modern consumer oriented marketing system based on"Internet+",the business complexity of Shanghai Tobacco Business has been increasing,and the business data volume has shown explosive growth.Although the application of distributed multi node parallel processing technology can significantly improve computational efficiency,it also leads to the problem of data skew in distributed computing systems becoming an unavoidable problem for big data plaforms.Therefore,the article proposes various methods to solve the problem of data skew,such as data preprocessing,improving distributed parallelism,aggregation,and algorithm processing for associated scenarios.
作者 朱文静 沈璐婕 张侃弘 ZHU Wenjing;SHEN Lujie;ZHANG Kanhong(Information Centre,Shanghai Tobacco Group Co.,Ltd.,Shanghai 200082,China)
出处 《信息与电脑》 2023年第17期86-89,共4页 Information & Computer
关键词 烟草商业 大数据 分布式 数据倾斜 tobacco business big data distributed data skew
  • 相关文献

参考文献6

二级参考文献35

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部