摘要
传统的日志分析技术在处理海量数据时存在计算瓶颈。针对该问题,研究了基于大数据技术的日志分析方案:由多台计算机完成日志文件的存储、分析、挖掘工作,建立了一个基于Hadoop开源框架的并行网络日志分析引擎,在MapReduce模型下重新实现了IP统计算法和异常检测算法。实验证明,在数据密集型计算中使用大数据技术可以明显提高算法的执行效率和增加系统的可扩展性。
There exists a calculation bottleneck when traditional log analysis technology processes the massive data.To solve this problem,a log analysis solution based on big data technology was proposed in this paper.In this solution,the storage and analysis,mining tasks of Log files will be decomposed on multiple computers.The open source framework Hadoop is used to establish a parallel network log analysis engine.IP statistics and outlier detection algorithm was rea-lized with MapReduce model.Empirical studies show that the use of big data technology in data-intensive computing can significantly improve the execution efficiency of algorithms and the scalability of system.
作者
应毅
任凯
刘亚军
YING Yi;REN Kai;LIU Ya-jun(College of Computer Science and Technology,Sanjiang University,Nanjing 210012,China;Jinling College,Nanjing University,Nanjing 210089,China;School of Computer Science and Engineering,Southeast University,Nanjing 210096,China)
出处
《计算机科学》
CSCD
北大核心
2018年第B11期353-355,共3页
Computer Science
基金
江苏省高等学校自然科学研究面上项目(17KJB520033)资助