摘要
DNS日志是互联网中重要的访问日志,数量巨大且承载着大量信息,需要借助大数据技术进行处理和分析。现网DNS日志数据量大,且数据倾斜现象严重,对MapReduce的性能有较为严重的影响。基于上述问题,采用小文件合并方法优化分片,缓解Map端的数据倾斜问题,并实现动态设置分片大小,提高MapReduce作业执行效率。该方法有效均衡了Map任务的负载,从而提高了数据倾斜情况下的MapReduce作业的执行效率和资源利用率。实验表明,使用该方法可以有效缩短MapReduce作业的执行时间。
DNS log is important access log of the Internet,which are large in number,meanwhile carries a large amount of information. DNS log could be processed and analyzed with big data technologies. In the actual network,the amount of DNS log is large and the data is skewed seriously. These characteristics of DNS logs have a serious impact on the performance of MapReduce.Based on the above problems,small file merging method is used to optimize slipt process of MapReduce and alleviate the data skew of the Map end. Moreover,this method realizes setting split size dynamically and improves the efficiency of MapReduce execution.This method balances the load of Map tasks effectively,and improves the execution efficiency of MapReduce job and resource utilization ratio under data skew. Experiments show that this method can effectively decrease the execution time of MapReduce jobs.
作者
刘鹤煜
张棪
杨兴华
崔华俊
谭倩
LIU Heyu;ZHANG Yan;YANG Xinghua;CUI Huajun;TAN Qian(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049,China)
出处
《智能计算机与应用》
2018年第2期73-77,共5页
Intelligent Computer and Applications