摘要
提出了一种基于DSM-MFI算法的改进算法DSMMFI-DS算法,它首先将事务数据按一定的全序关系存入DSFI-list列表中;然后按排序后的顺序存储到类似概要数据结构的树中;接着删除树中和DSFI-list列表中的非频繁项,同时删除窗口衰退支持数大的事务项;最后采用自顶向下和自底向上的双向搜索策略来挖掘数据流的最大频繁项集。通过用例分析和实验表明,该算法比DSM-MFI算法具有更好的执行效率。
Based on the algorithm of DSM-MFI, an improved algorithm, named DSMMFI-DS (Dic tionary Sequence Mining Maximal Frequent Itemsets over Data Streams), is proposed. Firstly, it stores transaction data into DSFI-list in alphabetical order. Secondly, the data are stored sequentially into the tree similar to the summary data structure. Thirdly, non-frequent items in the tree and DSFI-list are re- moved, and the transaction items with the maximum count of window attenuation supports are deleted. Finally, the strategy (top-down and bottom-up two-way search) is used to mine maximal frequent itemsets over data streams, and case analysis and experiments prove that the algorithm DSMMFI-DS has bet- ter performance than the algorithm DSM-MFI.
出处
《计算机工程与科学》
CSCD
北大核心
2014年第5期963-970,共8页
Computer Engineering & Science
关键词
数据挖掘
数据流
界标窗口
最大频繁项集
窗口衰减支持数
data mining
data stream
landmark windows
maximal frequent itemsets
window attenu-ation support count