摘要
上海烟草商业由于建设了基于“互联网+”面向消费者现代营销体系,业务复杂度不断提升,业务数据量呈现爆炸式增长。虽然应用分布式多节点并行处理技术可以大幅提高计算效率,但是这也导致分布式计算系统中的数据倾斜问题成为大数据平台绕不开的难题。为此,文章提出通过数据预处理、提升分布式并行度、聚合及关联场景的算法处理等多种方法来解决数据倾斜问题。
Due to the construction of a modern consumer oriented marketing system based on"Internet+",the business complexity of Shanghai Tobacco Business has been increasing,and the business data volume has shown explosive growth.Although the application of distributed multi node parallel processing technology can significantly improve computational efficiency,it also leads to the problem of data skew in distributed computing systems becoming an unavoidable problem for big data plaforms.Therefore,the article proposes various methods to solve the problem of data skew,such as data preprocessing,improving distributed parallelism,aggregation,and algorithm processing for associated scenarios.
作者
朱文静
沈璐婕
张侃弘
ZHU Wenjing;SHEN Lujie;ZHANG Kanhong(Information Centre,Shanghai Tobacco Group Co.,Ltd.,Shanghai 200082,China)
出处
《信息与电脑》
2023年第17期86-89,共4页
Information & Computer