摘要
现有的基于滑动窗口挖掘高效用项集的研究方法存在:候选项集通常数量巨大,需要大量的存储空间及计算候选项集的真实效用是非常耗时的问题。本文提出一种不生成候选项集的挖掘算法HUISW(high utility itemset mining over a siding window),HUISW采用一种新的树结构HUIL-Tree(high utility itemset tee which arranges items according to lexicographic order)存储滑动窗口中的项集信息,采用效用数据库存储项集在窗口事务中的效用信息,在挖掘过程中HUISW采用模式增长的方法对由HUIL-Tree生成的项集通过其与效用数据库的对应关系,直接计算其在滑动窗口中的效用,整个过程避免了候选项集的生成。在实验中通过由稀疏和稠密数据集模拟的数据流对HUISW进行性能评估,并与同类算法SHU-Growth(siding window based high utility growth)进行比较,实验结果表明HUISW显著优于SHU-Growth,运行时间最快可提升两个数量级。
Existing algorithms for HUIM over a sliding window have two problems:the number of candidates is usually very large and extensive memory is required,and candidate verification is time-consuming.Thus,in this paper an efficient HUISW algorithm(high utility itemset mining over a siding window)for mining high utility itemsets from a data stream without candidates is proposed.HUISW adopts a novel tree structure HUIL-Tree(a high utility itemset tree that arranges items according to lexicographic order)to store the information on the itemsets in a sliding window,and a utility database to store the utility information on the itemsets in the transactions of a window.During the mining process,the pattern-growth method was used to generate itemsets from HUIL-Tree.For each itemset generated,its utility in the window was calculated directly using the corresponding relationship between the itemset and the utility database.The whole process did not generate candidates.Extensive experiments on both sparse and dense stream datasets were performed to compare HUISW with the state-of-the-art algorithm SHU-Growth(siding window based high utility growth).The experimental results show that HUISW significantly outperforms SHU-Growth as the runtime of HUISW was two orders of magnitude faster.
作者
郭世明
高宏
GUO Shiming;GAO Hong(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2018年第4期721-729,共9页
Journal of Harbin Engineering University
基金
国家自然科学基金项目(61190115)
关键词
高效用项集
模式增长
数据流
效用挖掘
滑动窗口
数据挖掘
high utility itemsets
pattern growth
data streams
utility mining
sliding window
data mining