The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring d...The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring data of large-scale smart grids are massive, dynamic and highly dimensional, so global query, the method widely adopted in continuous queries in Wireless Sensor Networks(WSN), is rendered not suitable for its high energy consumption. The situation is even worse with increasing application complexity. We propose an energy-efficient query technique for large-scale smart grids based on variable regions. This method can query an arbitrary region based on variable physical windows, and optimizes data retrieve paths by a key nodes selection strategy. According to the characteristics of sensing data, we introduce an efficient filter into the each query subtree to keep non-skyline data from being retrieved. Experimental results show that our method can efficiently return the overview situation of any query region. Compared to TAG and ESA, the average query efficiency of our approach is improved by 79% and 46%, respectively; the total energy consumption of regional query is decreased by 82% and 50%, respectively.展开更多
Multi-way join is critical for many big data applications such as data mining and knowledge discovery. Even though lots of research have been devoted to processing multi-way joins using MapReduce, there are still seve...Multi-way join is critical for many big data applications such as data mining and knowledge discovery. Even though lots of research have been devoted to processing multi-way joins using MapReduce, there are still several problems in general to be further improved, such as transferring numerous unpromising intermediate data and lacking of better coordination mechanisms. This work proposes an efficient multi-way joins processing model using MapReduce, named Sharing-Coordination-MapReduce (SC-MapReduce), which has the functions of sharing and coordination. Our SC-MapReduce model can filter the unpromising intermediatedata largely by using the sharing mechanism and optimize the multiple tasks coordination of multi-way joins. Extensive experiments show that the proposed model is efficient, robust and scalable.展开更多
Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval condi...Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets.As an extension of skyline query,the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects.In addition,with the continuous promotion of Bigdata applications,the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure,no power of battery,accidental loss,so that the data might be incomplete with missing values in some attributes.Obviously,the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared.Meanwhile,the existing algorithms are unsuitable for directly used to the incomplete big data.Based on the above situations,this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment.First,we propose an index structure over incomplete data,named incomplete data index based on dominate hierarchical tree(ID-DHT).Applying the bucket strategy,the incomplete data is divided into different buckets according to the dimensions of missing attributes.Second,we also put forward query algorithm for incomplete data in MapReduce environment,named MapReduce incomplete data based on dominant hierarchical tree algorithm(MR-ID-DHTA).The data in the bucket is allocated to the subspace according to the dominant condition by Map function.Reduce function controls the data according to the key value and returns the k-dominant skyline query result.The effective experiments demonstrate the validity and usability of our index structure and the algorithm.展开更多
基金supported by the National Natural Science Foundation of China (NO. 61472072, 61528202, 61501105, 61472169)the Foundation of Science Public Welfare of Liaoning Province in China (NO. 2015003003)
文摘The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring data of large-scale smart grids are massive, dynamic and highly dimensional, so global query, the method widely adopted in continuous queries in Wireless Sensor Networks(WSN), is rendered not suitable for its high energy consumption. The situation is even worse with increasing application complexity. We propose an energy-efficient query technique for large-scale smart grids based on variable regions. This method can query an arbitrary region based on variable physical windows, and optimizes data retrieve paths by a key nodes selection strategy. According to the characteristics of sensing data, we introduce an efficient filter into the each query subtree to keep non-skyline data from being retrieved. Experimental results show that our method can efficiently return the overview situation of any query region. Compared to TAG and ESA, the average query efficiency of our approach is improved by 79% and 46%, respectively; the total energy consumption of regional query is decreased by 82% and 50%, respectively.
基金This work was supported by the National Natural Science Foundation of China under Grant No.60873068,61472169 the Program for Excellent Talents in Liaoning Province under Grant No.LR201017.
文摘Multi-way join is critical for many big data applications such as data mining and knowledge discovery. Even though lots of research have been devoted to processing multi-way joins using MapReduce, there are still several problems in general to be further improved, such as transferring numerous unpromising intermediate data and lacking of better coordination mechanisms. This work proposes an efficient multi-way joins processing model using MapReduce, named Sharing-Coordination-MapReduce (SC-MapReduce), which has the functions of sharing and coordination. Our SC-MapReduce model can filter the unpromising intermediatedata largely by using the sharing mechanism and optimize the multiple tasks coordination of multi-way joins. Extensive experiments show that the proposed model is efficient, robust and scalable.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.62072220,61802160,61502215)China Postdoctoral Science Foundation Funded Project(2020M672134)+1 种基金Science Research Fund of Liaoning Province Education Department(LJC201913)Doctor Research Start-up Fund of Liaoning Province(20180540106).
文摘Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects.Sometimes,a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets.As an extension of skyline query,the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects.In addition,with the continuous promotion of Bigdata applications,the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure,no power of battery,accidental loss,so that the data might be incomplete with missing values in some attributes.Obviously,the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared.Meanwhile,the existing algorithms are unsuitable for directly used to the incomplete big data.Based on the above situations,this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment.First,we propose an index structure over incomplete data,named incomplete data index based on dominate hierarchical tree(ID-DHT).Applying the bucket strategy,the incomplete data is divided into different buckets according to the dimensions of missing attributes.Second,we also put forward query algorithm for incomplete data in MapReduce environment,named MapReduce incomplete data based on dominant hierarchical tree algorithm(MR-ID-DHTA).The data in the bucket is allocated to the subspace according to the dominant condition by Map function.Reduce function controls the data according to the key value and returns the k-dominant skyline query result.The effective experiments demonstrate the validity and usability of our index structure and the algorithm.