摘要
不确定数据的频繁项集挖掘作为很多数据挖掘任务的基本步骤,引起了很多学者的关注。但是当不确定数据集的规模很大时,会产生数目巨大的频繁项集,给后续挖掘工作带来难题。为解决这一问题,论文提出不确定数据集中的代表频繁项集概念,并利用VC维的概念,确定抽样空间,提出一种基于随机抽样的代表频繁项集近似挖掘算法,在保证挖掘效果的前提下,减少了挖掘出的频繁项集的数量,提高算法的效率。
Since mining frequent itemsets in uncertain data is the fundamental step of many data mining tasks,it has attracted much attention from lots of researchers.However,this work will find large amount of frequent itemsets when the dataset is huge.It puts an obstacle to the next work.To address this problem,an efficient approximation mining algorithm of representative frequent itemsets is proposed in this paper.In the method,the VC-dimension theory is used to reduce the size of sample and provide satisfactory performance guarantees on the quality of the approximation.The algorithm is based on random sampling to mine representative frequent itemsets.It improves efficiency of mining task and reduces the number of frequent itemsets.
出处
《计算机与数字工程》
2017年第2期266-271,共6页
Computer & Digital Engineering
关键词
不确定数据
代表频繁项集
近似算法
VC维
uncertain data
representative frequent itemset
approximation algorithm
VC-dimension