With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick read...With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.展开更多
MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be m...MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.展开更多
Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive p...Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.展开更多
基金This work was partially supported by the National Natural Science Foundation of China(61876089).
文摘With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.
基金supported by the cooperation project of Research on Green Cloud IDC Resource Scheduling with ZTE Corporation
文摘MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
基金supported by the National Natural Science Foundation of China (Grant No.42101403)the National Key Researchand Development Program of China (Grant No.2017YFD0600404)。
文摘Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.