期刊文献+

基于分布式集群的高可用日志分析系统的设计 被引量:13

Design on High Availability Log Analysis System Based on Distributed Cluster
下载PDF
导出
摘要 在传统的访问日志分析系统中,数据采集效率较低且日志采集目录不能被递归监听,存储系统及计算系统缺乏高可用性。构建基于分布式集群的高可用日志分析系统,通过Nginx直连Kafka的方式采集实时分析的数据和自定义Source组件的Flume采集离线分析的数据,使用高可用的分布式文件系统HDFS和计算系统Spark分别提供持久化存储和计算引擎,利用Mysql和Hbase分别存储聚合数据及明细数据。实验结果表明,该系统的各项功能符合预期结果,直连Nginx-Kafka的采集方式和自定义Source组件的Flume明显提高采集效率,Zookeeper协调的分布式存储系统HDFS和计算系统Spark均满足高可用性,利用ALS算法测试存储与计算系统的功能。 In the traditional access log analysis system,the efficiency of collecting data is relatively low,and the log collection directory cannot be recursively monitored,and the storage system and the computing system lack high availability.Building a highly available log analysis system based on distributed cluster,Collecting data for real time analysis and offline analysis by the way of Nginx connecting Kafka directly and the Flume of custom Source component,the highly available Hadoop distributed file system(HDFS)and computing system Spark provide persistent storage and computing engine respectively,Using MySQL and HBase to store aggregated and detailed data respectively.The experimental results show that the functions of the improved system meet the expected results.the way of Nginx connecting Kafka directly and the Flume of custom Source component significantly improves the collecting efficiency,and distributed storage system HDFS and computing system Spark coordinated by Zookeeper meet high availability.Using ALS algorithm test the function of storage and computing system.
作者 陈乐 余粟 王盟 CHEN Le;YU Su;WANG Meng(Shanghai University of Engineering Science,Shanghai 201620,China)
出处 《中国电子科学研究院学报》 北大核心 2020年第5期420-426,共7页 Journal of China Academy of Electronics and Information Technology
基金 上海市科学技术委员会资助项目(175111110204)。
关键词 分布式集群 FLUME HDFS Spark 高可用性 Zookeeper distributed cluster Flume Hadoop Distributed File System(HDFS) Spark high availability Zookeeper
  • 相关文献

参考文献12

二级参考文献115

  • 1张伟丽,江春华,魏劲超.MySQL复制技术的研究及应用[J].计算机科学,2012,39(S3):168-170. 被引量:20
  • 2刘毅.略论网络舆情的概念、特点、表达与传播[J].理论界,2007(1):11-12. 被引量:311
  • 3Apache Software Foundation. Apache Zookeeper[EB/OL]. (2013 -02-01). http://Zookeeper.apache.org/.
  • 4Burrows M. The Chubby Lock Service for Loosely-coupled Distributed. Systems[C]//Proc. of 2006.
  • 5Operating Systems Design and Implementation Conference. Seattle, USA: [s. n.], 2006: 6-8.
  • 6Konstantin S, Kuang Hairong, Sanjay R. The Hadoop Distributed File System[C]//Proc. of the 26th IEEE Symposium on Mass Storage Systems and Technologies. Lake Tahoe, USA: IEEE Press, 2010: 3-7.
  • 7Bailey D, Barszcz E, Barton J. The NSE Parallel Bencn- marks[EB/OL]. (2013-02-01). http://citeseerx.ist.psu.edu/vie wdoc/summary?doi= 10.1.1.76.4758.
  • 8William D. Iozone[EB/OL]. (2013-02-01). http://www.iozone.org/.
  • 9Flavio J. Zab: High-performance Broadcast for Primarybackup Systems[C]//Proc. of the 41st IEEE/IFIP International Conf- erence on Dependable Systems and Networks. Hong Kong, China: [s. n.], 2011 : 245-256.
  • 10Becker D, Junqueira F, Serafini M. Leader Election for Replicated Services Using Application Scores[C]//Proc. of 12th ACM/IFIP/USENIX International Middleware Conf- erence. [S. 1]: ACM Press, 2011: 223-234.

共引文献741

同被引文献97

引证文献13

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部