期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Continuous ranking on uncertain streams 被引量:3
1
作者 cheqing jin jingwei ZHANG Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第6期686-699,共14页
Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database ma... Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time- complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods. 展开更多
关键词 possible world semantics uncertain data stream continuous ranking query sampling
原文传递
Popular route planning with travel cost estimation from trajectories 被引量:1
2
作者 Huiping LIU cheqing jin Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第1期191-207,共17页
With the increasing number of GPS-equipped vehicles,more and more trajectories are generated continuously,based on which some urban applications become feasible,such as route planning.In general,popular route that has... With the increasing number of GPS-equipped vehicles,more and more trajectories are generated continuously,based on which some urban applications become feasible,such as route planning.In general,popular route that has been travelled frequently is a good choice,especially for people who are not familiar with the road networks.Moreover,accurate estimation of the travel cost(such as travel time,travel fee and fuel consumption)will benefit a wellscheduled trip plan.In this paper,we address this issue by finding the popular route with travel cost estimation.To this end,we design a system consists of three main components.First,we propose a novel structure,called popular traverse graph where each node is a popular location and each edge is a popular route between locations,to summarize historical trajectories without road network information.Second,we propose a self-adaptive method to model the travel cost on each popular route at different time interval,so that each time interval has a stable travel cost.Finally,based on the graph,given a query consists of source,destination and leaving time,we devise an efficient route planning algorithmwhich considers optimal route concatenation to search the popular route from source to destination at the leaving time with accurate travel cost estimation.Moreover,we conduct comprehensive experiments and implement our system by a mobile App,the results show that our method is both effective and efficient. 展开更多
关键词 location-based services ROUTE planning TRAVEL cost ESTIMATION minimum DESCRIPTION length optimal road CONCATENATION
原文传递
MapReduce-based entity matching with multiple blocking functions 被引量:1
3
作者 cheqing jin Jie CHEN Huiping LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第5期895-911,共17页
Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known ... Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking- based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly. It is popular to use the MapReduce (MR) framework to improve the performance and the scalability of some compli- cated queries by running a lot of map (/reduce) tasks in parallel. However, entity matching upon the MapReduce frame- work is non-trivial due to two inevitable challenges: load balancing and pair deduplication. In this paper, we propose a novel solution, called M rEin, to handle these challenges with the support of multiple blocking functions. Although the existing work can deal with load balancing and pair deduplication respectively, it still cannot deal with both challenges at the same time. Theoretical analysis and experimental results upon real and synthetic data sets illustrate the high effectiveness and efficiency of our proposed solutions. 展开更多
关键词 entity matching MAPREDUCE load balancing pair deduplication
原文传递
A framework for cloned vehicle detection 被引量:1
4
作者 Minxi Li Jiali Mao +1 位作者 Xiaodong Qi cheqing jin 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第5期181-198,共18页
Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with ... Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with fake license plates accurately,and further explore the motives through discerning the behaviors of cloned vehicles.The ubiquitous inspection spots that deployed in the city have been collecting moving information of passing vehicles,which opens up a new opportunity for cloned vehicle detection.Existing detection methods cannot detect the cloned vehicle effectively due to that they use the fixed speed threshold.In this paper,we propose a two-phase framework,called CVDF,to detect cloned vehicles and discriminate behavior patterns of vehicles that use the same plate number.In the detection phase,cloned vehicles are identified based on speed thresholds extracted from historical trajectory and behavior abnormality analysis within the local neighborhood.In the behavior analysis phase,consider the traces of vehicles that uses the same license plate will be mixed together,we aim to differentiate the trajectories through matching degree-based clustering and then extract frequent temporal behavior patterns.The experimental results on the real-world data show that CVDF framework has high detection precision and could reveal cloned vehicles’behavior effectively.Our proposal provides a scientific basis for traffic management authority to solve the crime of cloned vehicle. 展开更多
关键词 cloned vehicle detection object identification behavior pattern mining
原文传递
Distributed top-k similarity query on big trajectory streams
5
作者 Zhigang ZHANG Xiaodong QI +3 位作者 Yilin WANG cheqing jin Jiali MAO Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第3期647-664,共18页
Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top?k similarity query, which finds k trajectories that are most... Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top?k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n·m),is huge when nor mis huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods. 展开更多
关键词 TOP-K SIMILARITY QUERY TRAJECTORY STREAM communication cost
原文传递
Online clustering of streaming trajectories
6
作者 Jiali MAO Qiuge SONG +2 位作者 cheqing jin Zhigang ZHANG Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第2期245-263,共19页
With the increasing availability of modern mobile devices and location acquisition technologies, massive trajectory data of moving objects are collected continuously in a streaming manner. Clustering streaming traject... With the increasing availability of modern mobile devices and location acquisition technologies, massive trajectory data of moving objects are collected continuously in a streaming manner. Clustering streaming trajectories facilitates finding the representative paths or common moving trends shared by different objects in real time. Although data stream clustering has been studied extensively in the past decade, little effort has been devoted to dealing with streaming trajectories. The main challenge lies in the strict space and time complexities of processing the continuously arriving trajectory data, combined with the difficulty of concept drift. To address this issue, we present two novel synopsis structures to extract the clustering characteristics of trajectories, and develop an incremental algorithm for the online clustering of streaming trajectories (called OCluST). It contains a micro-clustering component to cluster and summarize the most recent sets of trajectory line segments at each time instant, and a macro-clustering component to build large macro-clusters based on micro-clusters over a specified time horizon. Finally, we conduct extensive experiments on four real data sets to evaluate the effectiveness and efficiency of OCluST, and compare it with other congeneric algorithms. Experimental results show that OCluST can achieve superior performance in clustering streaming trajectories. 展开更多
关键词 streaming trajectory synopsis data structure concept drift sliding window
原文传递
Benchmarking in-memory database
7
作者 cheqing jin Yangxin KONG +2 位作者 Qiangqiang KANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第6期1067-1081,共15页
We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer sys... We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer system. Several companies and research institutions have devoted a lot of resources to develop in-memory databases (IMDB) that implement queries after loading data into (virtual) memory in advance. The bloom of various in-memory databases pursues us to test and evaluate their performance objectively and fairly. Although the existing database benchmarks like Wisconsin benchmark and TPC-X series have achieved great success, they cannot suit for in-memory databases due to the lack of consideration of unique characteristics of an IMDB. In this study, we propose MemTest, a novel benchmark that concerns some major characteristics of an in-memory database. This benchmark constructs particular metrics, which cover processing time, compression ratio, minimal memory space and column strength of an in-memory database. We design a data model based on inter-bank transaction applications, and a data generator to support uniform and skew data distributions. The MemTest workload includes a set of queries and transactions against the metrics and data model. Finally, we illustrate the efficacy of MemTest through the implementations on two different in-memory databases. 展开更多
关键词 BENCHMARK in-memory database MEMORY
原文传递
A privacy-enhancing scheme against contextual knowledge-based attacks in location-based services
8
作者 Jiaxun HUA Yu LIU +3 位作者 Yibin SHEN Xiuxia TIAN Yifeng LUO cheqing jin 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第3期225-227,共3页
1 Introduction and main contributions Location-based services are springing up around us,while leakages of users'privacy are inevitable during services.Even worse,adversaries may analyze intercepted service data,a... 1 Introduction and main contributions Location-based services are springing up around us,while leakages of users'privacy are inevitable during services.Even worse,adversaries may analyze intercepted service data,and extract more privacy like health and property.Therefore,privacy preservation is an indispensable guarantee on LBS security.Among the previous approaches to privacy preservation,k-anonymity-based ones have drawn much research attention[1-3].However,some privacy concern will be aroused if these schemes are adopted directly.For instance,Ut issues a query"Find the nearest hotel around me"in such an area as Fig.1(privacy profile k=4).DLS algorithm[2]constructs anonymity set A because these four cells have similar probabilities of being queried in the past.However,experienced adversaries can exclude some cells if they have learned rich contextual knowledge(side information)from historical data,such as features of each cell and LBS users. 展开更多
关键词 LBS services knowledge
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部