期刊文献+

Mining Frequent Itemsets in Correlated Uncertain Databases 被引量:1

Mining Frequent Itemsets in Correlated Uncertain Databases
原文传递
导出
摘要 Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and e?ciency of the proposed model and algorithms. Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and e?ciency of the proposed model and algorithms.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第4期696-712,共17页 计算机科学技术学报(英文版)
基金 This work is partially supported by the Hong Kong RGC Project under Grant No. N_HKUST637/13, the National Basic Research 973 Program of China under Grant No. 2014CB340303, the National Natural Science Foundation of China under Grant Nos. 61328202 and 61300031, Microsoft Research Asia Gift Grant, Google Faculty Award 2013, and Microsoft Research Asia Fellowship 2012.
关键词 CORRELATION uncertain data probabilistic frequent itemset correlation, uncertain data, probabilistic frequent itemset
  • 相关文献

参考文献36

  • 1Bohm C, Gruber M, Kunath P, Pryakhin A, Schubert M. ProVer: Probabilistic video retrieval using the gauss-tree. In Proc. the 23rd ICDE, April 2007, pp.1521-1522.
  • 2Chen L, Ng R T. On the marriage of Lp-norms and edit distance. In Proc. the 30th VLDB, August 31-September 3, 2004, pp.792-803.
  • 3Chen L, Czsu M T, Oria V. Robust and fast similarity search for moving object trajectories. In Proc. ACM SIG- MOD, June 2005, pp.491-502.
  • 4Cheng R, Kalashnikov D V, Prabhakar S. Querying im- precise data in moving object environments. IEEE Trans. Knowl. Data Eng., 2004, 16(9): 1112-1127.
  • 5Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks. In Proc. the 30th VLDB, August 31-September 3, 2004, pp.588-599.
  • 6Kodialam M S, Nandagopal T. Fast and reliable estimation schemes in RFID systems. In Proc. the 12th MOBICOM, September 2006, pp.322-333.
  • 7Liu Y, Liu K, Li M. Passive diagnosis for wireless sensor net- works. IEEE/ACM Trans. Netw., 2010, 18(4): 1132-1144.
  • 8Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data. In Proc. the 11th PAKDD, May 2007, pp.47-58.
  • 9Chui C K, Kao B. A decremental approach for mining frequent itemsets from uncertain data. In Proc. the 12th PAKDD, May 2008, pp.64-75.
  • 10Calders T, Garboni C, Goethals B. Efficient pattern mining of uncertain data with sampling. In Proc. the 14th PAKDD, June 2010, pp.480-487.

同被引文献3

引证文献1

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部