期刊文献+
共找到27篇文章
< 1 2 >
每页显示 20 50 100
Scalable and quantitative contention generation for performance evaluation on OLTP databases
1
作者 Chunxi ZHANG Yuming LI +2 位作者 Rong ZHANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期15-31,共17页
Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's pl... Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's platform on some special dates,widely expected by global customers.Although we have achieved significant progress in improving the scalability of transactional database systems(OLTP),the presence of contention operations in workloads is still one of the fundamental obstacles to performance improving.The reason is that the overhead of managing conflict transactions with concurrency control mechanisms is proportional to the amount of contentions.As a consequence,generating contented workloads is urgent to evaluate performance of modern OLTP database systems.Though we have kinds of standard benchmarks which provide some ways in simulating contentions,e.g.,skew distribution control of transactions,they can not control the generation of contention quantitatively;even worse,the simulation effectiveness of these methods is affected by the scale of data.So in this paper we design a scalable quantitative contention generation method with fine contention granularity control.We conduct a comprehensive set of experiments on popular opensourced DBMSs compared with the latest contention simulation method to demonstrate the effectiveness of our generation work. 展开更多
关键词 high contention OLTP database performance evaluation database benchmarking
原文传递
Accurate and efficient follower log repair for Raft-replicated database systems 被引量:3
2
作者 Jinwei GUO Peng CAI +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第2期91-103,共13页
State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates th... State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates the log to other followers.Since the followers can handle read requests and many real workloads are usually read-intensive,the recovery speed of a crashed follower may significantly impact on the throughput.Different from traditional database recovery,the recovering follower needs to repair its local log first.Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower.To reduce network round trips,an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point,and then to directly fetch all committed log entries from the leader in one round trip.However,if the commit point is not persisted,the recovering follower has to get the whole log from the leader.In this paper,we propose an accurate and efficient log repair(AELR)algorithm for follower recovery.AELR is more robust and resilient to follower failure,and it only needs one network round trip to fetch the least number of log entries for follower recovery.This approach is implemented in the open source database system OceanBase.We experimentally show that the system adopting AELR has a good performance in terms of recovery time. 展开更多
关键词 RAFT high availability log replication log repair
原文传递
A Fast Filling Algorithm for Image Restoration Based on Contour Parity 被引量:1
3
作者 Yan Liu Wenxin Hu +2 位作者 Longzhe Han Maksymyuk Taras Zhiyun Chen 《Computers, Materials & Continua》 SCIE EI 2020年第4期509-519,共11页
Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper propos... Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper proposes a novel algorithm for filling holes and regions of the images.The proposed algorithm combines the advantages of both the parity-check filling approach and the region-growing inpainting technique.Pairing points of the region’s boundary are used to search and to fill the region.The scanning range of the filling method is within the target regions.The proposed method does not require additional working memory or assistant colors,and it can correctly fill any complex contours.Experimental results show that,compared to other approaches,the proposed algorithm fills regions faster and with lower computational cost. 展开更多
关键词 Region filling image restoration parity check region growing
下载PDF
Efficient and stable quorum-based log replication and replay for modern cluster-databases
4
作者 Donghui WANG Peng CAI +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第5期143-158,共16页
The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raf... The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance.However,it is non-trivial to replicate IMDB because high transaction rate has brought new challenges.First,the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes.Second,followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap.Third,modern databases are often built with a cluster of commodity machines connected by low configuration networks,in which the network anomalies often happen.In this case,the performance would be significantly affected because the follower node falls into the long-duration exception handling process(e.g.,fetch lost logs from the leader).To this end,we build QuorumX,an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads.QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings.Further,we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs.We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance.Our evaluation results with the YCSB,TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap. 展开更多
关键词 log replication log replay consensus protocol high performance high availability QUORUM unreliable network packet loss
原文传递
Scalable and adaptive log manager in distributed systems
5
作者 Huan ZHOU Weining QIAN +3 位作者 Xuan ZHOU Qiwen DONG Aoying ZHOU Wenrong TAN 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期45-62,共18页
On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component o... On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component of distributed database management systems(DDBMSs).The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers.With the advent of new hardware and high parallelism of transaction processing,the traditional centralized design of logging limits scalability,and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.In this paper,we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems.The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking.The kernel of adaptive replication is an adaptive log shipping method,which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload.We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000.Experimental results show that Salmo scales well by increasing the number of working threads,improves peak throughput by 1.56×and reduces latency by more than 4×over log replication of Raft,and maintains efficient and stable performance under dynamic workloads all the time. 展开更多
关键词 distributed database systems transaction log-ging log replication SCALABLE ADAPTIVE
原文传递
D-Cubicle:boosting data transfer dynamically for large-scale analytical queries in single-GPU systems
6
作者 Jialun WANG Wenhao PANG +1 位作者 Chuliang WENG Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第4期141-153,共13页
In analytical queries,a number of important operators like JOIN and GROUP BY are suitable for parallelization,and GPU is an ideal accelerator considering its power of parallel computing.However,when data size increase... In analytical queries,a number of important operators like JOIN and GROUP BY are suitable for parallelization,and GPU is an ideal accelerator considering its power of parallel computing.However,when data size increases to hundreds of gigabytes,one GPU card becomes insufficient due to the small capacity of global memory and the slow data transfer between host and device.A straightforward solution is to equip more GPUs linked with high-bandwidth connectors,but the cost will be highly increased.We utilize unified memory(UM)produced by NVIDIA CUDA(Compute Unified Device Architecture)to make it possible to accelerate large-scale queries on just one GPU,but we notice that the transfer performance between host and UM,which happens before kernel execution,is often significantly slower than the theoretical bandwidth.An important reason is that,in singleGPU environment,data processing systems usually invoke only one or a static number of threads for data copy,leading to an inefficient transfer which slows down the overall performance heavily.In this paper,we present D-Cubicle,a runtime module to accelerate data transfer between host-managed memory and unified memory.D-Cubicle boosts the actual transfer speed dynamically through a self-adaptive approach.In our experiments,taking data transfer into account,D-Cubicle processes 200 GB of data on a single GPU with 32 GB of global memory,achieving 1.43x averagely and 2.09x maximally the performance of the baseline system. 展开更多
关键词 data analytics GPU unified memory
原文传递
High-availability in-memory key-value store using RDMA and Optane DCPMM
7
作者 Xuecheng QI Huiqi HU +5 位作者 Jinwei GUO Chenchen HUANG Xuan ZHOU Ning XU Yu FU Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第1期221-223,共3页
1 Introduction and main contributiions Emerging hardwares like remote Direct Memory Access(RDMA)capable networks and persistent memory(PM)are promising to build fast high availability in-memory key-value stores.The re... 1 Introduction and main contributiions Emerging hardwares like remote Direct Memory Access(RDMA)capable networks and persistent memory(PM)are promising to build fast high availability in-memory key-value stores.The recent advent of Intel Optane DC Persistent Memory Modules(Optane DCPMM)brings the future closer.However,existing studies to combine the two devices cannot deliver the desired performance due to their two-phase protocols for log shipping and most of them were based on emulation that perform sub-optimally on real PM hardware. 展开更多
关键词 HARDWARE RDMA VALUE
原文传递
SMEC:Scene Mining for E-Commerce
8
作者 王罡 李翔 +2 位作者 郭子义 殷大伟 马帅 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第1期192-210,共19页
Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.I... Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.In this article,we study scene discovery for E-commerce systems.We first formalize a scene as a set of commodity cate-gories that occur simultaneously and frequently in real-world situations,and model an E-commerce platform as a heteroge-neous information network(HIN),whose nodes and links represent different types of objects and different types of rela-tionships between objects,respectively.We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN.To solve the problem,we pro-pose a non-negative matrix factorization based method SMEC(Scene Mining for E-Commerce),and theoretically prove its convergence.Using six real-world E-commerce datasets,we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods,and show that SMEC consistently outperforms its competitors with regard to various evaluation measures. 展开更多
关键词 graph clustering E-COMMERCE heterogeneous information network(HIN) scene mining
原文传递
Fault-tolerant precise data access on distributed log-structured merge-tree 被引量:2
9
作者 Tao ZHU Huiqi HU +2 位作者 Weining QIAN Huan ZHOU Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第4期760-777,共18页
Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a mem... Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods. 展开更多
关键词 DISTRIBUTED data storage log-structured MERGE TREE LINEARIZABILITY fault tolerance
原文传递
EnAli:entity alignment across multiple heterogeneous data sources 被引量:2
10
作者 Chao KONG Ming GAO +3 位作者 Chen XU Yunbin FU Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第1期157-169,共13页
Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fie... Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline. 展开更多
关键词 ENTITY ALIGNMENT EXPONENTIAL family LOCALITY sensitive HASHING EM-algofithm
原文传递
A parallel data generator for efficiently generating “realistic” social streams
11
作者 Chengcheng YU Fan XIA +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第5期1072-1101,共30页
A social stream refers to the data stream that records a series of social entities and the dynamic interac-tions between two entities. It can be employed to model the changes of entity states in numerous applications.... A social stream refers to the data stream that records a series of social entities and the dynamic interac-tions between two entities. It can be employed to model the changes of entity states in numerous applications. The social streams, the combination of graph and streaming data, pose great challenge to efficient analytical query processing, and are key to better understanding users' behavior. Considering of privacy and other related issues, a social stream genera-tor is of great significance. A framework of synthetic social stream generator (SSG) is proposed in this paper. The gener-ated social streams using SSG can be tuned to capture sev-eral kinds of fundamental social stream properties, includ-ing patterns about users' behavior and graph patterns. Ex-tensive empirical studies with several real-life social stream data sets show that SSG can produce data that better fit to real data. It is also confirmed that SSG can generate social stream data continuously with stable throughput and memory consumption. Furthermore, we propose a parallel implemen-tation of SSG with the help of asynchronized parallel pro-cessing model and delayed update strategy. Our experiments verify that the throughput of the parallel implementation can increase linearly by increasing nodes. 展开更多
关键词 SOCIAL STREAM data GENERATOR SSG parallel generation
原文传递
Image copy-move forgery passive detection based on improved PCNN and self-selected sub-images
12
作者 Guoshuai Zhou Xiuxia Tian Aoying Zhou 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第4期131-146,共16页
Image forgery detection remains a challenging problem.For the most common copy-move forgery detection,the robustness and accuracy of existing methods can still be further improved.To the best of our knowledge,we are t... Image forgery detection remains a challenging problem.For the most common copy-move forgery detection,the robustness and accuracy of existing methods can still be further improved.To the best of our knowledge,we are the first to propose an image copy-move forgery passive detection method by combining the improved pulse coupled neural network(PCNN)and the self-selected sub-images.Our method has the following steps:First,contour detection is performed on the input color image,and bounding boxes are drawn to frame the contours to form suspected forgery sub-images.Second,by improving PCNN to perform feature extraction of sub-images,the feature invariance of rotation,scaling,noise adding,and so on can be achieved.Finally,the dual feature matching is used to match the features and locate the forgery regions.What’s more,the self-selected sub-images can quickly obtain suspected forgery sub-images and lessen the workload of feature extraction,and the improved PCNN can extract image features with high robustness.Through experiments on the standard image forgery datasets CoMoFoD and CASIA,it is effectively verified that the robustness score and accuracy of proposed method are much higher than the current best method,which is a more efficient image copy-move forgery passive detection method. 展开更多
关键词 image copy-move forgery passive detection self-selected sub-images pulse coupled neural network(PCNN) dual feature matching
原文传递
Online clustering of streaming trajectories
13
作者 Jiali MAO Qiuge SONG +2 位作者 Cheqing JIN Zhigang ZHANG Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第2期245-263,共19页
With the increasing availability of modern mobile devices and location acquisition technologies, massive trajectory data of moving objects are collected continuously in a streaming manner. Clustering streaming traject... With the increasing availability of modern mobile devices and location acquisition technologies, massive trajectory data of moving objects are collected continuously in a streaming manner. Clustering streaming trajectories facilitates finding the representative paths or common moving trends shared by different objects in real time. Although data stream clustering has been studied extensively in the past decade, little effort has been devoted to dealing with streaming trajectories. The main challenge lies in the strict space and time complexities of processing the continuously arriving trajectory data, combined with the difficulty of concept drift. To address this issue, we present two novel synopsis structures to extract the clustering characteristics of trajectories, and develop an incremental algorithm for the online clustering of streaming trajectories (called OCluST). It contains a micro-clustering component to cluster and summarize the most recent sets of trajectory line segments at each time instant, and a macro-clustering component to build large macro-clusters based on micro-clusters over a specified time horizon. Finally, we conduct extensive experiments on four real data sets to evaluate the effectiveness and efficiency of OCluST, and compare it with other congeneric algorithms. Experimental results show that OCluST can achieve superior performance in clustering streaming trajectories. 展开更多
关键词 streaming trajectory synopsis data structure concept drift sliding window
原文传递
Query Authentication Using Intel SGX for Blockchain Light Clients
14
作者 邵奇峰 张召 +1 位作者 金澈清 周傲英 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第3期714-734,共21页
Due to limited computing and storage resources,light clients and full nodes coexist in a typical blockchain system.Any query from light clients must be forwarded to full nodes for execution,and light clients verify th... Due to limited computing and storage resources,light clients and full nodes coexist in a typical blockchain system.Any query from light clients must be forwarded to full nodes for execution,and light clients verify the integrity of query results returned.Since existing verifiable queries based on an authenticated data structure(ADS)suffer from significant network,storage and computing overheads by virtue of verification objects(VOs),an alternative way turns to the trusted execution environment(TEE),with which light clients do not need to receive or verify any VO.However,state-of-the-art TEEs cannot deal with large-scale applications conveniently due to the limited secure memory space(e.g.,the size of the enclave in Intel SGX(software guard extensions),a typical TEE product,is only 128 MB).Hence,we organize data hierarchically in trusted(enclave)and untrusted memory,along with hot data buffered in the enclave to reduce page swapping overhead between two kinds of memory.The cost analysis and empirical study validate the effectiveness of our proposed scheme.The VO size of our scheme is reduced by one to two orders of magnitude compared with that of the traditional scheme. 展开更多
关键词 blockchain query authentication Merkle B-tree(MB-tree) Intel software guard extensions(SGX)
原文传递
Dynamic depth-width optimization for capsule graph convolutional network
15
作者 Shangwei WU Yingtong XIONG Chuliang WENG 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第6期159-161,共3页
1 Introduction Encouraged by the success of Convolutional Neural Networks(CNNs),many studies[1],known as Graph Convolutional Networks(GCNs),borrowed the idea of convolution and redefined it for graph data.In graph-lev... 1 Introduction Encouraged by the success of Convolutional Neural Networks(CNNs),many studies[1],known as Graph Convolutional Networks(GCNs),borrowed the idea of convolution and redefined it for graph data.In graph-level classification tasks,Classic GCN methods[2,3]generate graph embeddings based on the learned node embeddings which consider each node’s representation as multiple independent scalar features.However,they neglect the detailed mutual relations among different node features such as position,direction,and connection.Inspired by CapsNet[4]which encodes each feature of an image as a vector(a capsule),CapsGNN[5]extracts multi-scale node features from different convolutional layers in the form of capsules.However,CapsGNN uses a static model structure to conduct training,which inherently restricts its representation ability on different datasets. 展开更多
关键词 CONVOLUTION REPRESENTATION mutual
原文传递
Popular route planning with travel cost estimation from trajectories 被引量:1
16
作者 Huiping LIU Cheqing JIN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第1期191-207,共17页
With the increasing number of GPS-equipped vehicles,more and more trajectories are generated continuously,based on which some urban applications become feasible,such as route planning.In general,popular route that has... With the increasing number of GPS-equipped vehicles,more and more trajectories are generated continuously,based on which some urban applications become feasible,such as route planning.In general,popular route that has been travelled frequently is a good choice,especially for people who are not familiar with the road networks.Moreover,accurate estimation of the travel cost(such as travel time,travel fee and fuel consumption)will benefit a wellscheduled trip plan.In this paper,we address this issue by finding the popular route with travel cost estimation.To this end,we design a system consists of three main components.First,we propose a novel structure,called popular traverse graph where each node is a popular location and each edge is a popular route between locations,to summarize historical trajectories without road network information.Second,we propose a self-adaptive method to model the travel cost on each popular route at different time interval,so that each time interval has a stable travel cost.Finally,based on the graph,given a query consists of source,destination and leaving time,we devise an efficient route planning algorithmwhich considers optimal route concatenation to search the popular route from source to destination at the leaving time with accurate travel cost estimation.Moreover,we conduct comprehensive experiments and implement our system by a mobile App,the results show that our method is both effective and efficient. 展开更多
关键词 location-based services ROUTE planning TRAVEL cost ESTIMATION minimum DESCRIPTION length optimal road CONCATENATION
原文传递
Incremental join view maintenance on distributed log-structured storage 被引量:1
17
作者 Huichao DUAN Huiqi HU +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第4期105-120,共16页
Modern database systems desperate for the ability to support highly scalable transactions and efficient queries simultaneously for real-time applications.One solution is to utilize query optimization techniques on the... Modern database systems desperate for the ability to support highly scalable transactions and efficient queries simultaneously for real-time applications.One solution is to utilize query optimization techniques on the on-line transaction processing(OLTP)systems.The materialized view is considered as a panacea to decrease query latency.However,it also involves the significant cost of maintenance which trades away transaction performance.In this paper,we examine the design space and conclude several design features for the implementation of a view on a distributed log-structured merge-tree(LSMtree),which is a well-known structure for improving data write performance.As a result,we develop two incremental view maintenance(IVM)approaches on LSM-tree.One avoids join computation in view maintenance transactions.Another with two optimizations is proposed to decouple the view maintenance with the transaction process.Under the asynchronous update,we also provide consistency queries for views.Experiments on TPC-H benchmark show our methods achieve better performance than straightforward methods on different workloads. 展开更多
关键词 materialized views asynchronous maintenance hybrid transaction and analytical process LSM-tree
原文传递
A framework for cloned vehicle detection 被引量:1
18
作者 Minxi Li Jiali Mao +1 位作者 Xiaodong Qi Cheqing Jin 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第5期181-198,共18页
Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with ... Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with fake license plates accurately,and further explore the motives through discerning the behaviors of cloned vehicles.The ubiquitous inspection spots that deployed in the city have been collecting moving information of passing vehicles,which opens up a new opportunity for cloned vehicle detection.Existing detection methods cannot detect the cloned vehicle effectively due to that they use the fixed speed threshold.In this paper,we propose a two-phase framework,called CVDF,to detect cloned vehicles and discriminate behavior patterns of vehicles that use the same plate number.In the detection phase,cloned vehicles are identified based on speed thresholds extracted from historical trajectory and behavior abnormality analysis within the local neighborhood.In the behavior analysis phase,consider the traces of vehicles that uses the same license plate will be mixed together,we aim to differentiate the trajectories through matching degree-based clustering and then extract frequent temporal behavior patterns.The experimental results on the real-world data show that CVDF framework has high detection precision and could reveal cloned vehicles’behavior effectively.Our proposal provides a scientific basis for traffic management authority to solve the crime of cloned vehicle. 展开更多
关键词 cloned vehicle detection object identification behavior pattern mining
原文传递
Distributed top-k similarity query on big trajectory streams
19
作者 Zhigang ZHANG Xiaodong QI +3 位作者 Yilin WANG Cheqing JIN Jiali MAO Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第3期647-664,共18页
Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top?k similarity query, which finds k trajectories that are most... Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top?k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n·m),is huge when nor mis huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods. 展开更多
关键词 TOP-K similarity QUERY TRAJECTORY STREAM communication cost
原文传递
Partition pruning for range query on distributed log-structured merge-tree
20
作者 Chenchen HUANG Huiqi HU +2 位作者 Xing WEI Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第3期159-174,共16页
Log-structured merge tree(LSM-tree)is adopted by many distributed storage systems.It contains a Memtable and a number of SSTables.The Memtable is an in-memory structure and the SSTable is a disk-based structure.Data r... Log-structured merge tree(LSM-tree)is adopted by many distributed storage systems.It contains a Memtable and a number of SSTables.The Memtable is an in-memory structure and the SSTable is a disk-based structure.Data records are horizontally partitioned over the primary key and stored in different SSTables.Data writes on records are first served by the Memtable and then compacted to SSTables periodically.Although this design optimizes data writes by avoiding random disk writes,it is unfriendly to read request since the results should be retrieved and merged from both Memtable and SSTables.In particular,when the Memtable and SSTables are distributed on different nodes,it incurs expensive costs to serve range queries.A range query on nonprimary key columns has to scan all partitions,which generates many network and I/O expenses.In this paper,we propose a partition pruning strategy to save cost for range queries.A statistics cache is designed to determine whether a partition contains the desired data or not,which enables read requests to avoid scanning useless partitions.As records can be updated in Memtable freely,to prevent incorrect filtering,a version-based cache synchronization strategy is proposed to ensure the queries to obtain the latest data state.We implement the proposed method in an open source distributed database and conduct comprehensive experiments.Experimental results reveal that the performance of range queries increased 30%~40%with our partition pruning technique. 展开更多
关键词 LSM-tree table partitioning statistics cache consistency
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部