Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. ...Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.展开更多
Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accur...Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.展开更多
针对现实不确定数据流具备分布非凸性和包含大量噪声等特点,提出不确定数据流聚类算法Clu_Ustream(clustering on uncertain stream)来解决对近期数据进行实时高效聚类演化问题。首先,在线部分利用子窗口采样机制采集滑动窗口中的不确...针对现实不确定数据流具备分布非凸性和包含大量噪声等特点,提出不确定数据流聚类算法Clu_Ustream(clustering on uncertain stream)来解决对近期数据进行实时高效聚类演化问题。首先,在线部分利用子窗口采样机制采集滑动窗口中的不确定流数据,采用双层概要统计结构链表存储概率密度网格的统计信息;然后,离线聚类过程中通过衰减窗口机制弱化老旧数据的影响,并定期对窗口中的过期子窗口进行清理;同时采用动态异常网格删除机制有效过滤离群点,从而降低算法的时空复杂度。在模拟数据集和网络入侵真实数据集上的仿真结果表明,Clu_Ustream算法与其他同类算法相比具有较高的聚类质量和效率。展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.61025007,61328202,61173029,61100024,61332006,and 61073063the National High Technology Research and Development 863 Program of China under Grant No.2012AA011004the National Basic Research 973 Program of China under Grant No.2011CB302200-G
文摘Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182. Acknowledgement The authors would like to thank anonymous reviewers and editors for their valuable comments.
文摘Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.
文摘针对现实不确定数据流具备分布非凸性和包含大量噪声等特点,提出不确定数据流聚类算法Clu_Ustream(clustering on uncertain stream)来解决对近期数据进行实时高效聚类演化问题。首先,在线部分利用子窗口采样机制采集滑动窗口中的不确定流数据,采用双层概要统计结构链表存储概率密度网格的统计信息;然后,离线聚类过程中通过衰减窗口机制弱化老旧数据的影响,并定期对窗口中的过期子窗口进行清理;同时采用动态异常网格删除机制有效过滤离群点,从而降低算法的时空复杂度。在模拟数据集和网络入侵真实数据集上的仿真结果表明,Clu_Ustream算法与其他同类算法相比具有较高的聚类质量和效率。