With the integration of distributed generation and the construction of cross-regional long-distance power grids, power systems become larger and more complex.They require faster computing speed and better scalability ...With the integration of distributed generation and the construction of cross-regional long-distance power grids, power systems become larger and more complex.They require faster computing speed and better scalability for power flow calculations to support unit dispatch.Based on the analysis of a variety of parallelization methods, this paper deploys the large-scale power flow calculation task on a cloud computing platform using resilient distributed datasets(RDDs).It optimizes a directed acyclic graph that is stored in the RDDs to solve the low performance problem of the MapReduce model.This paper constructs and simulates a power flow calculation on a large-scale power system based on standard IEEE test data.Experiments are conducted on Spark cluster which is deployed as a cloud computing platform.They show that the advantages of this method are not obvious at small scale, but the performance is superior to the stand-alone model and the MapReduce model for large-scale calculations.In addition, running time will be reduced when adding cluster nodes.Although not tested under practical conditions, this paper provides a new way of thinking about parallel power flow calculations in large-scale power systems.展开更多
Advanced Persistent Threat (APT) attack, an attack option in recent years, poses serious threats to the security of governments and enterprises data due to its advanced and persistent attacking characteristics. To a...Advanced Persistent Threat (APT) attack, an attack option in recent years, poses serious threats to the security of governments and enterprises data due to its advanced and persistent attacking characteristics. To address this issue, a security policy of big data analysis has been proposed based on the analysis of log data of servers and terminals in Spark. However, in practical applications, Spark cannot suitably analyze very huge amounts of log data. To address this problem, we propose a scheduling optimization technique based on the reuse of datasets to improve Spark performance. In this technique, we define and formulate the reuse degree of Directed Acyclic Graphs (DAGs) in Spark based on Resilient Distributed Datasets (RDDs). Then, we define a global optimization function to obtain the optimal DAG sequence, that is, the sequence with the least execution time. To implement the global optimization function, we further propose a novel cost optimization algorithm based on the traditional Genetic Algorithm (GA). Our experiments demonstrate that this scheduling optimization technique in Spark can greatly decrease the time overhead of analyzing log data for detecting APT attacks.展开更多
基金supported by National Natural Science Foundation of China (No.51677072)
文摘With the integration of distributed generation and the construction of cross-regional long-distance power grids, power systems become larger and more complex.They require faster computing speed and better scalability for power flow calculations to support unit dispatch.Based on the analysis of a variety of parallelization methods, this paper deploys the large-scale power flow calculation task on a cloud computing platform using resilient distributed datasets(RDDs).It optimizes a directed acyclic graph that is stored in the RDDs to solve the low performance problem of the MapReduce model.This paper constructs and simulates a power flow calculation on a large-scale power system based on standard IEEE test data.Experiments are conducted on Spark cluster which is deployed as a cloud computing platform.They show that the advantages of this method are not obvious at small scale, but the performance is superior to the stand-alone model and the MapReduce model for large-scale calculations.In addition, running time will be reduced when adding cluster nodes.Although not tested under practical conditions, this paper provides a new way of thinking about parallel power flow calculations in large-scale power systems.
基金supported by the National Natural Science Foundation of China (Nos. 61379144, 61572026, 61672195, and 61501482)the Open Foundation of State Key Laboratory of Cryptology
文摘Advanced Persistent Threat (APT) attack, an attack option in recent years, poses serious threats to the security of governments and enterprises data due to its advanced and persistent attacking characteristics. To address this issue, a security policy of big data analysis has been proposed based on the analysis of log data of servers and terminals in Spark. However, in practical applications, Spark cannot suitably analyze very huge amounts of log data. To address this problem, we propose a scheduling optimization technique based on the reuse of datasets to improve Spark performance. In this technique, we define and formulate the reuse degree of Directed Acyclic Graphs (DAGs) in Spark based on Resilient Distributed Datasets (RDDs). Then, we define a global optimization function to obtain the optimal DAG sequence, that is, the sequence with the least execution time. To implement the global optimization function, we further propose a novel cost optimization algorithm based on the traditional Genetic Algorithm (GA). Our experiments demonstrate that this scheduling optimization technique in Spark can greatly decrease the time overhead of analyzing log data for detecting APT attacks.