Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. ...Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.展开更多
Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accur...Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.展开更多
Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database ma...Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time- complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.61025007,61328202,61173029,61100024,61332006,and 61073063the National High Technology Research and Development 863 Program of China under Grant No.2012AA011004the National Basic Research 973 Program of China under Grant No.2011CB302200-G
文摘Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182. Acknowledgement The authors would like to thank anonymous reviewers and editors for their valuable comments.
文摘Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.
文摘Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time- complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.