The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the batte...The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the battery life of moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior work focuses on a simplified scenario where queries are either static or rarely change their positions. In this study, two novel efficient location update strategies are proposed in a trajectory movement model and an arbitrary movement model, respectively. The first strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the affected data objects, without flooding the system with many unnecessary location update requests. The second proposed strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Location Information Tables (LITs) which (a) allow each moving object to estimate possible query movements and issue a location update only when it may affect any query results and (b) enable smart server probing that results in fewer messages. We first define the data structure of an LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. Simulation results confirm that both the ASR and PLU concepts improve scalability and efficiency over existing methods.展开更多
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an inde...RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.展开更多
As stream data is being more frequently collected and analyzed, stream processing systems are faced with more design challenges. One challenge is to perform continuous window aggregation, which involves intensive comp...As stream data is being more frequently collected and analyzed, stream processing systems are faced with more design challenges. One challenge is to perform continuous window aggregation, which involves intensive computation. When there are a large number of aggregation queries, the system may suffer from scalability problems. The queries are usually similar and only differ in window specifications. In this paper, we propose collaborative aggregation which promotes aggregate sharing among the windows so that repeated aggregate operations can be avoided. Different from the previous approaches in which the aggregate sharing is restricted by the window pace, we generalize the aggregation over multiple values as a series of reductions. Therefore, the results generated by each reduction step can be shared. The sharing process is formalized in the feed semantics and we present the compose-and-declare framework to determine the data sharing logic at a very low cost. Experimental results show that our approach offers an order of magnitude performance improvement to the state-of-the-art results and has a small memory footprint.展开更多
The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
The clustering of trajectories over huge volumes of streaming data has been rec- ognized as critical for many modem applica- tions. In this work, we propose a continuous clustering of trajectories of moving objects ov...The clustering of trajectories over huge volumes of streaming data has been rec- ognized as critical for many modem applica- tions. In this work, we propose a continuous clustering of trajectories of moving objects over high speed data streams, which updates online trajectory clusters on basis of incremental line- segment clustering. The proposed clustering algorithm obtains trajectory clusters efficiently and stores all closed trajectory clusters in a bi- tree index with efficient search capability. Next, we present two query processing methods by utilising three proposed pruning strategies to fast handle two continuous spatio-temporal queries, threshold-based trajectory clustering queries and threshold-based trajectory outlier detections. Finally, the comprehensive experi- mental studies demonstrate that our algorithm achieves excellent effectiveness and high effi- ciency for continuous clustering on both syn- thetic and real streaming data, and the propo- sed query processing methods utilise average 90% less time than the naive query methods.展开更多
Data stream management system (DSMS) provides convenient solutions to the problem of processing continuous queries on data streams.Previous approaches for scheduling these queries and their operators assume that each ...Data stream management system (DSMS) provides convenient solutions to the problem of processing continuous queries on data streams.Previous approaches for scheduling these queries and their operators assume that each operator runs in separate thread or all operators combine in one query plan and run in a single thread.Both approaches suffer from severe drawbacks concerning the thread overhead and the stalls due to expensive operators.To overcome these drawbacks,a novel approach called clustered operators scheduling (COS) is proposed that adaptively clusters operators of the query plan into a number of groups based on their selectivity and computing cost using S-mean clustering.Experimental evaluation is provided to demonstrate the potential benefits of COS scheduling over the other scheduling strategies.COS can provide adaptive,flexible,reliable,scalable and robust design for continuous query processor.展开更多
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
In a mobile/pervasive computing environment,one of the most important goals of monitoring continuous spatial queries is to reduce communication cost for location-updates.Existing work uses many cellular wireless conne...In a mobile/pervasive computing environment,one of the most important goals of monitoring continuous spatial queries is to reduce communication cost for location-updates.Existing work uses many cellular wireless connections,which would easily become the performance bottleneck of the overall system.This paper introduces a novel continuous kNN query monitoring method to reduce communication cost in the hybrid wireless network,where the moving objects in the wireless broadcasting system construct the ad-hoc network.Simulation results prove the efficiency of the proposed method,which leverages the wireless broadcasting channel as well as the WiFi link to alleviate the burden on the cellular uplink communication cost.展开更多
In location-based services, a density query re- turns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most ...In location-based services, a density query re- turns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the exist- ing methods try very hard to improve the accuracy of query results, but ignore query efficiency. However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically main- tained. Two indexing structures are also used to index candi- date regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.展开更多
A new update strategy, distance-based update strategy, is presented in Location Dependent Continuous Query (LDCQ) under error limitation. There are different possibilities to intersect when the distances between movin...A new update strategy, distance-based update strategy, is presented in Location Dependent Continuous Query (LDCQ) under error limitation. There are different possibilities to intersect when the distances between moving objects and the querying boundary are different.Therefore, moving objects have different influences to the query result. We set different deviation limits for different moving objects according to distances. A great number of unnecessary updates are reduced and the payload of the system is relieved.展开更多
Recent advances in big and streaming data systems have enabled real-time analysis of data generated by Internet of Things(IoT)systems and sensors in various domains.In this context,many applications require integratin...Recent advances in big and streaming data systems have enabled real-time analysis of data generated by Internet of Things(IoT)systems and sensors in various domains.In this context,many applications require integrating data from several heterogeneous sources,either stream or static sources.Frameworks such as Apache Spark are able to integrate and process large datasets from different sources.However,these frameworks are hard to use when the data sources are heterogeneous and numerous.To address this issue,we propose a system based on mediation techniques for integrating stream and static data sources.The integration process of our system consists of three main steps:configuration,query expression and query execution.In the configuration step,an administrator designs a mediated schema and defines mapping between the mediated schema and local data sources.In the query expression step,users express queries using customized SQL grammar on the mediated schema.Finally,our system rewrites the query into an optimized Spark application and submits the application to a Spark cluster.The results are continuously returned to users.Our experiments show that our optimizations can improve query execution time by up to one order of magnitude,making complex streaming and spatial data analysis more accessible.展开更多
As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processi...As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes (known as data uncertainty), to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm.展开更多
To solve the problem that the traditional location-based services anonymity model is not applied to the continuous query, quasi real-time cloak algorithm (QR-TCA) has been proposed, and the non-delay cloak model (N...To solve the problem that the traditional location-based services anonymity model is not applied to the continuous query, quasi real-time cloak algorithm (QR-TCA) has been proposed, and the non-delay cloak model (N-DCM) has been established. After comprehensive analysis of location-based continuous query privacy protection models, a model that can solve the user service delay problem is proposed, which can provide the users with quasi-real time location-based services. The experiment measures the N-DCM model with multiple dimensions, such as service response time and service quality of standard datasets. The experiment results show that the method is suitable for continuous query location privacy protection and can effectively protect the user’s location privacy.展开更多
Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this ...Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.展开更多
Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database ma...Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time- complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.展开更多
This paper investigates the problem of inconsistent states of radio frequency identification (RFID) tag data caused by incomplete execution of read/write operations during access to RFID tag memory. Passive RFID tag...This paper investigates the problem of inconsistent states of radio frequency identification (RFID) tag data caused by incomplete execution of read/write operations during access to RFID tag memory. Passive RFID tags require RF communication to access memory data. This study is motivated by the volatility of RF communication, where instability is caused by intermittent connections and uncertain communication. If a given tag disappears from the communication area of the reader during the reading or writing of tag data, the operation is incomplete, resulting in an inconsistent state of tag data. To avoid this inconsistency, it is necessary to ensure that any operations on tag memory are completed. In this paper, we propose an asynchronous reprocessing model for finalizing any incomplete execution of read/write operations to remove inconsistent states. The basic idea is to resume incomplete operations autonomously by detecting a tag's re-observation from any reader. To achieve this, we present a concurrency control mechanism based on continuous query processing that enables the suspended tag operations to be re-executed. The performance study shows that our model improves the number of successful operations considerably in addition to suppressing inconsistent data access completely.展开更多
基金supported by NSF of USA under Grant Nos. IIS-0534761, CNS-0831502, CNS-0855251NUS AcRF under GrantNo. WBS R-252-050-280-101/133
文摘The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the battery life of moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior work focuses on a simplified scenario where queries are either static or rarely change their positions. In this study, two novel efficient location update strategies are proposed in a trajectory movement model and an arbitrary movement model, respectively. The first strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the affected data objects, without flooding the system with many unnecessary location update requests. The second proposed strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Location Information Tables (LITs) which (a) allow each moving object to estimate possible query movements and issue a location update only when it may affect any query results and (b) enable smart server probing that results in fewer messages. We first define the data structure of an LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. Simulation results confirm that both the ASR and PLU concepts improve scalability and efficiency over existing methods.
基金the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (The Regional Research Universities Pro-gram/Research Center for Logistics Information Technology)
文摘RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.
基金This work was supported by the National Natural Science Foundation of China under Grant No. 61173160, the National Basic Research 973 Program of China under Grant No. 2015CB358800, and the Scientific Research Program of the Higher Education Institution of Xinjiang Uygur Autonomous Region of China under Grant No. XJEDU2014S087.
文摘As stream data is being more frequently collected and analyzed, stream processing systems are faced with more design challenges. One challenge is to perform continuous window aggregation, which involves intensive computation. When there are a large number of aggregation queries, the system may suffer from scalability problems. The queries are usually similar and only differ in window specifications. In this paper, we propose collaborative aggregation which promotes aggregate sharing among the windows so that repeated aggregate operations can be avoided. Different from the previous approaches in which the aggregate sharing is restricted by the window pace, we generalize the aggregation over multiple values as a series of reductions. Therefore, the results generated by each reduction step can be shared. The sharing process is formalized in the feed semantics and we present the compose-and-declare framework to determine the data sharing logic at a very low cost. Experimental results show that our approach offers an order of magnitude performance improvement to the state-of-the-art results and has a small memory footprint.
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
基金supported by the National Natural Science Foundation of China under Grants No.61172049,No.61003251the National High Technology Research and Development Program of China(863 Program)under Grant No.2011AA040101the Doctoral Fund of Ministry of Education of Chinaunder Grant No.20100006110015
文摘The clustering of trajectories over huge volumes of streaming data has been rec- ognized as critical for many modem applica- tions. In this work, we propose a continuous clustering of trajectories of moving objects over high speed data streams, which updates online trajectory clusters on basis of incremental line- segment clustering. The proposed clustering algorithm obtains trajectory clusters efficiently and stores all closed trajectory clusters in a bi- tree index with efficient search capability. Next, we present two query processing methods by utilising three proposed pruning strategies to fast handle two continuous spatio-temporal queries, threshold-based trajectory clustering queries and threshold-based trajectory outlier detections. Finally, the comprehensive experi- mental studies demonstrate that our algorithm achieves excellent effectiveness and high effi- ciency for continuous clustering on both syn- thetic and real streaming data, and the propo- sed query processing methods utilise average 90% less time than the naive query methods.
基金Project(50275150) supported by the National Natural Science Foundation of ChinaProject(20040533035) supported by the National Research Foundation for the Doctoral Program of Higher Education of China
文摘Data stream management system (DSMS) provides convenient solutions to the problem of processing continuous queries on data streams.Previous approaches for scheduling these queries and their operators assume that each operator runs in separate thread or all operators combine in one query plan and run in a single thread.Both approaches suffer from severe drawbacks concerning the thread overhead and the stalls due to expensive operators.To overcome these drawbacks,a novel approach called clustered operators scheduling (COS) is proposed that adaptively clusters operators of the query plan into a number of groups based on their selectivity and computing cost using S-mean clustering.Experimental evaluation is provided to demonstrate the potential benefits of COS scheduling over the other scheduling strategies.COS can provide adaptive,flexible,reliable,scalable and robust design for continuous query processor.
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
基金Project supported by the second stage of the Brain Korea 21 Project
文摘In a mobile/pervasive computing environment,one of the most important goals of monitoring continuous spatial queries is to reduce communication cost for location-updates.Existing work uses many cellular wireless connections,which would easily become the performance bottleneck of the overall system.This paper introduces a novel continuous kNN query monitoring method to reduce communication cost in the hybrid wireless network,where the moving objects in the wireless broadcasting system construct the ad-hoc network.Simulation results prove the efficiency of the proposed method,which leverages the wireless broadcasting channel as well as the WiFi link to alleviate the burden on the cellular uplink communication cost.
文摘In location-based services, a density query re- turns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the exist- ing methods try very hard to improve the accuracy of query results, but ignore query efficiency. However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically main- tained. Two indexing structures are also used to index candi- date regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.
文摘A new update strategy, distance-based update strategy, is presented in Location Dependent Continuous Query (LDCQ) under error limitation. There are different possibilities to intersect when the distances between moving objects and the querying boundary are different.Therefore, moving objects have different influences to the query result. We set different deviation limits for different moving objects according to distances. A great number of unnecessary updates are reduced and the payload of the system is relieved.
基金financed by the French government IDEX-ISITE initiative 16-IDEX-0001(CAP 20-25)the PhD is funded by the European Regional Development Fund(FEDER).
文摘Recent advances in big and streaming data systems have enabled real-time analysis of data generated by Internet of Things(IoT)systems and sensors in various domains.In this context,many applications require integrating data from several heterogeneous sources,either stream or static sources.Frameworks such as Apache Spark are able to integrate and process large datasets from different sources.However,these frameworks are hard to use when the data sources are heterogeneous and numerous.To address this issue,we propose a system based on mediation techniques for integrating stream and static data sources.The integration process of our system consists of three main steps:configuration,query expression and query execution.In the configuration step,an administrator designs a mediated schema and defines mapping between the mediated schema and local data sources.In the query expression step,users express queries using customized SQL grammar on the mediated schema.Finally,our system rewrites the query into an optimized Spark application and submits the application to a Spark cluster.The results are continuously returned to users.Our experiments show that our optimizations can improve query execution time by up to one order of magnitude,making complex streaming and spatial data analysis more accessible.
基金supported by the National Natural Science Foundation of China under Grant Nos.61073061,61003044,61303019the Natural Science Foundation of Colleges and Universities of Jiangsu Province of China under Grant No.12KJB520017
文摘As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes (known as data uncertainty), to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm.
基金Supported by the National Natural Science Foundation(61170035, 61272420)the Natural Science Foundation of Jiangsu Province(BK2011022, BK2011702)+2 种基金the Fundamental Research Funds for the Central Universities (30920130112006)Jiangsu Province Blue Project Innovation team,Nanjing science and technology project (020142010)Nanjing University of Science and Technology in 2009 Zijin Star Project Funding
文摘To solve the problem that the traditional location-based services anonymity model is not applied to the continuous query, quasi real-time cloak algorithm (QR-TCA) has been proposed, and the non-delay cloak model (N-DCM) has been established. After comprehensive analysis of location-based continuous query privacy protection models, a model that can solve the user service delay problem is proposed, which can provide the users with quasi-real time location-based services. The experiment measures the N-DCM model with multiple dimensions, such as service response time and service quality of standard datasets. The experiment results show that the method is suitable for continuous query location privacy protection and can effectively protect the user’s location privacy.
基金This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.
文摘Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
文摘Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time- complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.
基金supported by the Grant of the Regional Core Research Program/Institute of Logistics Information Technology of Korean Ministry of Education, Science and Technology
文摘This paper investigates the problem of inconsistent states of radio frequency identification (RFID) tag data caused by incomplete execution of read/write operations during access to RFID tag memory. Passive RFID tags require RF communication to access memory data. This study is motivated by the volatility of RF communication, where instability is caused by intermittent connections and uncertain communication. If a given tag disappears from the communication area of the reader during the reading or writing of tag data, the operation is incomplete, resulting in an inconsistent state of tag data. To avoid this inconsistency, it is necessary to ensure that any operations on tag memory are completed. In this paper, we propose an asynchronous reprocessing model for finalizing any incomplete execution of read/write operations to remove inconsistent states. The basic idea is to resume incomplete operations autonomously by detecting a tag's re-observation from any reader. To achieve this, we present a concurrency control mechanism based on continuous query processing that enables the suspended tag operations to be re-executed. The performance study shows that our model improves the number of successful operations considerably in addition to suppressing inconsistent data access completely.