摘要
传统的网络日志分析系统面临数据来源的多样化、文件分级优化存储、处理效率低和系统时延较长等问题。文章设计了一种基于分布式平台Hadoop的实时网络日志分析系统原型。首先,利用Filebeat组件采集各节点日志文件,根据文件分级归档管理机制对数据过滤、修剪之后发送至Kafka;然后,Kafka根据不同主题把日志流进行分组,实现分级优化存储;最后,使用Kibana工具实现日志文件的高效搜索、可视化分析等操作。实验结果表明:采用模块化设计方案降低了系统的耦合性,文件分级优化存储可以缩短数据处理时延,Kafka服务集群可以提升日志系统高并发处理能力,满足日志实时分析性能要求。
Traditional network log analysis system has been confronted with issues such as the diversification of data sources,file hierarchical optimization and storage,low processing efficiency and long delay.In this paper,a model with real-time network log analysis system is designed based-on the distributed platform Hadoop.First,the Filebeat component is employed to collect the log files of every node.Then,the data are filtered and pruned according to the file hierarchical archive management mechanism and sent to Kafka.After that,log data flow is grouped according to different topics to realize hierarchical optimization storage by using Kafka.Finally,Kibana tools are used to realize the efficient search of log files and visualized analysis and other operations.The experimental results show that:(1)Coupling between systems is reduced by using modularized design solutions;(2)File hierarchical optimization and storage can shorten data processing delay;(3)Kafka service cluster can improve the high concurrent processing capability of log system,so it can satisfy the performance requirements of real-time log analysis.
作者
何长鹏
He Changpeng(School of Public Security Technology,Gansu University of Political Science and Law,Lanzhou 730070,China)
出处
《江苏科技信息》
2020年第27期63-66,共4页
Jiangsu Science and Technology Information
基金
甘肃省高等学校创新基金项目,项目编号:2020B-164
甘肃政法大学校级科研资助项目,项目编号:GZFXQNLW003。