The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot...The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.展开更多
The cloud downloading scheme, first proposed by us in 2011, has effectively optimized hundreds of millions of users' downloading experiences. Also, people start to build a variety of useful Internet services on top o...The cloud downloading scheme, first proposed by us in 2011, has effectively optimized hundreds of millions of users' downloading experiences. Also, people start to build a variety of useful Internet services on top of cloud downloading. In brief, by using cloud facilities to download (and cache) the requested file from the "best-effort" Internet on behalf of the user, cloud downloading ensures the data availability and remarkably enhances the data delivery speed. Although this scheme seems simple and straightforward, designing a real-world cloud downloading system involves complicated and subtle trade-offs (between deployment cost and user experience) when serving a large number of users: 1) how to plan the cloud cache capacity to achieve a high and affordable cache hit ratio, 2) how to accelerate the data delivery from the cloud to numerous users, 3) how to handle the dense user requests for highly popular files, and 4) how to judge a potential downloading failure of the cloud. This paper addresses these design trade-offs from a practical perspective, based on big data from a nationwide commercial cloud downloading system, i.e., Tencent QQXuanfeng. Its running traces help us find reasonable design strategies and parameters, and its real performances confirm the efficacy of our design. Our study provides solid experiences and valuable heuristics for the developers of similar and relevant systems.展开更多
The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the batte...The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the battery life of moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior work focuses on a simplified scenario where queries are either static or rarely change their positions. In this study, two novel efficient location update strategies are proposed in a trajectory movement model and an arbitrary movement model, respectively. The first strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the affected data objects, without flooding the system with many unnecessary location update requests. The second proposed strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Location Information Tables (LITs) which (a) allow each moving object to estimate possible query movements and issue a location update only when it may affect any query results and (b) enable smart server probing that results in fewer messages. We first define the data structure of an LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. Simulation results confirm that both the ASR and PLU concepts improve scalability and efficiency over existing methods.展开更多
文摘The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.
基金This work is sponsored by the National Natural Science Foundation of China under Grant Nos. 61471217 and 61472266, the China Postdoctoral Science Foundation under Grant No. 2014M550735, and the CCF-Tencent Open Fund under Grant No. AGR20150201.
文摘The cloud downloading scheme, first proposed by us in 2011, has effectively optimized hundreds of millions of users' downloading experiences. Also, people start to build a variety of useful Internet services on top of cloud downloading. In brief, by using cloud facilities to download (and cache) the requested file from the "best-effort" Internet on behalf of the user, cloud downloading ensures the data availability and remarkably enhances the data delivery speed. Although this scheme seems simple and straightforward, designing a real-world cloud downloading system involves complicated and subtle trade-offs (between deployment cost and user experience) when serving a large number of users: 1) how to plan the cloud cache capacity to achieve a high and affordable cache hit ratio, 2) how to accelerate the data delivery from the cloud to numerous users, 3) how to handle the dense user requests for highly popular files, and 4) how to judge a potential downloading failure of the cloud. This paper addresses these design trade-offs from a practical perspective, based on big data from a nationwide commercial cloud downloading system, i.e., Tencent QQXuanfeng. Its running traces help us find reasonable design strategies and parameters, and its real performances confirm the efficacy of our design. Our study provides solid experiences and valuable heuristics for the developers of similar and relevant systems.
基金supported by NSF of USA under Grant Nos. IIS-0534761, CNS-0831502, CNS-0855251NUS AcRF under GrantNo. WBS R-252-050-280-101/133
文摘The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the battery life of moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior work focuses on a simplified scenario where queries are either static or rarely change their positions. In this study, two novel efficient location update strategies are proposed in a trajectory movement model and an arbitrary movement model, respectively. The first strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the affected data objects, without flooding the system with many unnecessary location update requests. The second proposed strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Location Information Tables (LITs) which (a) allow each moving object to estimate possible query movements and issue a location update only when it may affect any query results and (b) enable smart server probing that results in fewer messages. We first define the data structure of an LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. Simulation results confirm that both the ASR and PLU concepts improve scalability and efficiency over existing methods.