摘要
提出了一种新的CMNL-SW(Closed map and num list-sliding window)挖掘算法。具体使用数据结构Closedmap存储挖掘到的闭合项集和Num list存储所有不同项的序号,通过对添加新事务和删除旧事务包含的项序号进行简单的并集和该事务与之相关已经挖掘到的闭合项集进行交集运算来更新当前滑动窗口,使之能够根据用户任意指定的支持度阈值在线输出数据流上闭合频繁项集信息。通过理论分析和对真实数据集Mushroom,Retail-chain和人工合成数据集T40I10D100K的挖掘结果表明,提出的算法在时空效率上明显优于同类经典算法Moment和CFI-Stream,并且随着数据流上处理事务数的递增和快速改变表现出良好的稳定性。
A new online mining algorithm called the closed map and num list-sliding window (CMNL-SW) is proposed. It uses two data structures, i.e. closed map stores, the closed itemsets, those are mined and the num list stores the number of all different items. Via the simple union operation on item number contained within a new arriving or an old deleting transaction and the intersection operation on certain previous closed itemsets once, it incrementally updates the current sliding window and makes the closed frequent itemsets be output in real time based on the specified thresholds of any user. Theoretical analysis and experimental results of the real datasets, such as mushroom, retail-chain and artificially synthesized datasets T40110D100K show that the proposed method is superior to the classic algorithms Moment and CFI-Stream in terms of time and space efficiencies, and it has good stability as the number of transactions processed increases and adapts rapidly to the change in data streams.
出处
《数据采集与处理》
CSCD
北大核心
2012年第4期508-513,共6页
Journal of Data Acquisition and Processing
关键词
挖掘算法
闭合频繁项集
滑动窗口
数据流
mining algorithm
closed frequent itemsets
sliding window
data stream