With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the...With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.展开更多
Research on the spatio-temporal correlation between the intensity of human activities and the temperature of earth surfaces is of great significance in many aspects,including fully understanding the causes and mechani...Research on the spatio-temporal correlation between the intensity of human activities and the temperature of earth surfaces is of great significance in many aspects,including fully understanding the causes and mechanisms of climate change,actively adapting to climate change,pursuing rational development,and protecting the ecological environment.Taking the north slope of Tianshan Mountains,located in the arid area of northwestern China and extremely sensitive to climate change,as the research area,this study retrieves the surface temperature of the mountain based on MODIS data,while characterizing the intensity of human activities thereby data on the night light,population distribution and land use.The evolution characteristics of human activity intensity and surface temperature in the study area from 2000 to 2018 were analyzed,and the spatio-temporal correlation between them was further explored.It is found that:(1)The average human activity intensity(0.11)in the research area has kept relatively low since this century,and the overall trend has been slowly rising in a stepwise manner(0.0024·a-1);in addition,the increase in human activity intensity has lagged behind that in construction land and population by 1-2 years.(2)The annual average surface temperature in the area is 7.18℃with a pronounced growth.The rate of change(0.02℃·a-1)is about 2.33 times that of the world.The striking boost in spring(0.068℃·a-1)contributes the most to the overall warming trend.Spatially,the surface temperature is low in the south and high in the north,due to the prominent influence of the underlying surface characteristics,such as elevation and vegetation coverage.(3)The intensity of human activity and the surface temperature are remarkably positively correlated in the human activity areas there,showing a strong distribution in the east section and a weak one in the west section.The expression of its spatial differentiation and correlation is comprehensively affected by such factors as scopes of human activities,manifestations,and land-use changes.Vegetation-related human interventions,such as agriculture and forestry planting,urban greening,and afforestation,can effectively reduce the surface warming caused by human activities.This study not only puts forward new ideas to finely portray the intensity of human activities but also offers a scientific reference for regional human-land coordination and overall development.展开更多
Closely related to the economy,the analysis and management of electricity consumption has been widely studied.Conventional approaches mainly focus on the prediction and anomaly detection of electricity consumption,whi...Closely related to the economy,the analysis and management of electricity consumption has been widely studied.Conventional approaches mainly focus on the prediction and anomaly detection of electricity consumption,which fails to reveal the in-depth relationships between electricity consumption and various factors such as industry,weather etc..In the meantime,the lack of analysis tools has increased the difficulty in analytical tasks such as correlation analysis and comparative analysis.In this paper,we introduce EcoVis,a visual analysis system that supports the industrial-level spatio-temporal correlation analysis in the electricity consumption data.We not only propose a novel approach to model spatio-temporal data into a graph structure for easier correlation analysis,but also introduce a novel visual representation to display the distributions of multiple instances in a single map.We implement the system with the cooperation with domain experts.Experiments are conducted to demonstrate the effectiveness of our method.展开更多
目的:探究我国2010—2020年60岁及以上老年人口健康预期寿命的性别差异及时空分布特征,为促进健康预期寿命性别平等和地区均衡提供实证依据。方法:基于全国第六次和第七次人口普查数据,使用沙利文法计算我国60岁及以上老年人口健康预期...目的:探究我国2010—2020年60岁及以上老年人口健康预期寿命的性别差异及时空分布特征,为促进健康预期寿命性别平等和地区均衡提供实证依据。方法:基于全国第六次和第七次人口普查数据,使用沙利文法计算我国60岁及以上老年人口健康预期寿命并比较其性别差异及时期变动,采用空间自相关(Moran s I)分析健康预期寿命余寿占比的空间分布特点。结果:健康率随年龄增加而降低,男性老年人口健康率高于女性,性别差异主要集中在高龄段且随着时间缩小。女性的平均预期寿命和健康预期寿命均高于男性且提升幅度大于男性。健康预期寿命余寿占比,随时间扩大的基础上表现为男性高于女性,但性别差异呈缩小趋势;其空间分布特征为东部优于西部,且地区间的聚集程度增强。结论:十年间,我国老年人口健康水平提高,女性在长寿方面存在优势,但生存质量与男性相比仍然存在一定差距;老年人健康预期寿命余寿占比扩大,符合“疾病压缩”假说;同时,健康水平性别差异不断弥合,但地区间非均衡性加深。展开更多
调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声...调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声空间聚类算法(Parameter Adaptation-Density Based Spatial Clustering of Applications With Noise,PADBSCAN)对异常数据进行辨识。此外,基于自相关性理论和数据传输的周期性,分析和挖掘传输数据以及用电行为中的潜在规律,通过用电相似度判据消除时间偏移的影响,对伪异常数据进行辨识。基于山东省某市的调度自动化系统主子站通道实测数据对本文方法有效性进行验证,结果表明所提方法能够有效辨识异常数据,能够满足实际工程需要。展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2016YFB1000601
文摘With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming;the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
基金National Natural Science Foundation of China(41461086)National Natural Science Foundation of China(41761108)。
文摘Research on the spatio-temporal correlation between the intensity of human activities and the temperature of earth surfaces is of great significance in many aspects,including fully understanding the causes and mechanisms of climate change,actively adapting to climate change,pursuing rational development,and protecting the ecological environment.Taking the north slope of Tianshan Mountains,located in the arid area of northwestern China and extremely sensitive to climate change,as the research area,this study retrieves the surface temperature of the mountain based on MODIS data,while characterizing the intensity of human activities thereby data on the night light,population distribution and land use.The evolution characteristics of human activity intensity and surface temperature in the study area from 2000 to 2018 were analyzed,and the spatio-temporal correlation between them was further explored.It is found that:(1)The average human activity intensity(0.11)in the research area has kept relatively low since this century,and the overall trend has been slowly rising in a stepwise manner(0.0024·a-1);in addition,the increase in human activity intensity has lagged behind that in construction land and population by 1-2 years.(2)The annual average surface temperature in the area is 7.18℃with a pronounced growth.The rate of change(0.02℃·a-1)is about 2.33 times that of the world.The striking boost in spring(0.068℃·a-1)contributes the most to the overall warming trend.Spatially,the surface temperature is low in the south and high in the north,due to the prominent influence of the underlying surface characteristics,such as elevation and vegetation coverage.(3)The intensity of human activity and the surface temperature are remarkably positively correlated in the human activity areas there,showing a strong distribution in the east section and a weak one in the west section.The expression of its spatial differentiation and correlation is comprehensively affected by such factors as scopes of human activities,manifestations,and land-use changes.Vegetation-related human interventions,such as agriculture and forestry planting,urban greening,and afforestation,can effectively reduce the surface warming caused by human activities.This study not only puts forward new ideas to finely portray the intensity of human activities but also offers a scientific reference for regional human-land coordination and overall development.
基金This work was supported by the Science and Technology Project of China Southern Power Grid Corporation(ZBKJXM20180157)the National Natural Science Foundation of China(Grant Nos.61772456,61761136020).
文摘Closely related to the economy,the analysis and management of electricity consumption has been widely studied.Conventional approaches mainly focus on the prediction and anomaly detection of electricity consumption,which fails to reveal the in-depth relationships between electricity consumption and various factors such as industry,weather etc..In the meantime,the lack of analysis tools has increased the difficulty in analytical tasks such as correlation analysis and comparative analysis.In this paper,we introduce EcoVis,a visual analysis system that supports the industrial-level spatio-temporal correlation analysis in the electricity consumption data.We not only propose a novel approach to model spatio-temporal data into a graph structure for easier correlation analysis,but also introduce a novel visual representation to display the distributions of multiple instances in a single map.We implement the system with the cooperation with domain experts.Experiments are conducted to demonstrate the effectiveness of our method.
文摘目的:探究我国2010—2020年60岁及以上老年人口健康预期寿命的性别差异及时空分布特征,为促进健康预期寿命性别平等和地区均衡提供实证依据。方法:基于全国第六次和第七次人口普查数据,使用沙利文法计算我国60岁及以上老年人口健康预期寿命并比较其性别差异及时期变动,采用空间自相关(Moran s I)分析健康预期寿命余寿占比的空间分布特点。结果:健康率随年龄增加而降低,男性老年人口健康率高于女性,性别差异主要集中在高龄段且随着时间缩小。女性的平均预期寿命和健康预期寿命均高于男性且提升幅度大于男性。健康预期寿命余寿占比,随时间扩大的基础上表现为男性高于女性,但性别差异呈缩小趋势;其空间分布特征为东部优于西部,且地区间的聚集程度增强。结论:十年间,我国老年人口健康水平提高,女性在长寿方面存在优势,但生存质量与男性相比仍然存在一定差距;老年人健康预期寿命余寿占比扩大,符合“疾病压缩”假说;同时,健康水平性别差异不断弥合,但地区间非均衡性加深。
文摘调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声空间聚类算法(Parameter Adaptation-Density Based Spatial Clustering of Applications With Noise,PADBSCAN)对异常数据进行辨识。此外,基于自相关性理论和数据传输的周期性,分析和挖掘传输数据以及用电行为中的潜在规律,通过用电相似度判据消除时间偏移的影响,对伪异常数据进行辨识。基于山东省某市的调度自动化系统主子站通道实测数据对本文方法有效性进行验证,结果表明所提方法能够有效辨识异常数据,能够满足实际工程需要。