摘要
针对当前云计算环境下数据中心性能异常检测方法的实时性、可扩展性问题,提出一种云数据中心环境下基于Spark的性能异常实时检测方法Spark—ADOPD(Spark-based Anomaly Detection OverPerformance DataInRealtime).方法设计基于Spark的分布式、可扩展流数据聚类算法对采集的云数据中心性能数据进行自动分类,建立性能异常预测模型;定义相似度函数,通过计算持续到达的性能数据与预测模型的相似度,挖掘性能异常行为,以动态调整资源分配.实验结果证明Spark-ADOPD具有较好的实时性和准确性.
In view of the real - time performance and scalability problems of data center performance anomaly detec- tion method in cloud computing environment, in this paper, the author proposes a real time detection method based on Spark in the cloud data center environment, that is Spark - ADOPD ( Spark - based Anomaly Detection Over Per- formance Data In Real -time). The method is designed based on Spark, which can automatically classify the per- formance data collected by the data clustering algorithm, to establish a performance anomaly prediction model. The similarity function is defined, by calculating the similarity between the performance data and the prediction model of continuous arrival, excavating performance behavior, to dynamically adjust the allocation of resources. The experimental results show that Spark -ADOPD has better real -time performance and accuracy.
出处
《西安职业技术学院学报》
2016年第3期1-5,19,共6页
Research on Vocational Education in Xi'an Vocational and Technical College
基金
本文系2014年度山东省科技发展计划资助项目“智能交通大数据实时计算关键技术研究及应用”(项目编号:2014GGX101013)
2015年度山东省重点研发计划资助项目“基于大数据实时计算方法的交通流在线知识发现关键技术研究”(项目编号:2015GGX101032)阶段性成果.
关键词
异常检测
流数据聚类
SPARK
资源调度
云数据中心
anomaly detection
stream data clustering
Spark
resource scheduling
cloud data center