摘要
EAST装置产生的实验数据规模日益变大,对EAST上的MDSplus数据存储服务器进行有效地监控是很有必要的。为了方便实验人员对MDSplus服务器上的用户进行管理,设计一个MDSplus日志离线和实时分析系统。MDSplus日志分析系统采用的大数据处理框架是Hadoop生态圈的MapReduce离线计算模型和Spark生态圈中的Spark Streaming实时数据计算模型。系统还使用Flume、Kafka的日志监测、聚合、分发等关键性技术,使得MDSplus海量日志数据的处理能力变为可能,且能够在秒级别处理千万条未处理的MDSplus日志信息,离线和实时处理后展现在Web端。测试表明,系统工作能够满足设计需求,对聚变实验数据的管理具有重要的应用价值。
The experiment data generated by the EAST device is getting larger and larger, and it is necessary to monitor the MDSplus data storage server on EAST. In order to facilitate the management of users on the MDSplus server, an MDSplus log offline and real-time analysis system is required. The big data processing frameworks, adopted by the MDSplus log analysis system, were the MapReduce offline computing model in the Hadoop ecosystem and the Spark Streaming real-time data computing model in the Spark ecosystem. The framework also made use of key technologies such as log monitoring, aggregation and distribution with framework likes Flume and Kafka, which made it possible for MDSplus mass log data processing power. The system could process tens of millions of unprocessed MDSplus log information at a second level, and then display it on the web after offline and real-time processing. The test shows that the system can meet the design requirements and has important application value to the management of fusion experiment data.
作者
章琦皓
王枫
王月婷
Zhang Qihao;Wang Feng;Wang Yueting(Institute of Plasma Physics,Chinese Academy of Sciences,Hefei 230031,Anhui,China;University of Science and Technology of China,Hefei 230026,Anhui,China)
出处
《计算机应用与软件》
北大核心
2018年第9期50-55,共6页
Computer Applications and Software
基金
国家重点研发计划项目(2017YFE0300500
2017YFE0300505)