摘要
频繁闭合模式集可惟一确定频繁模式完全集且数据量要小几个数量级。根据分布式数据流的特点,提出了一种挖掘频繁闭合项集的算法,该算法采用K叉树形结构,以叶子节点接收各条数据流,创建DSFCI_tree结构来存贮各条数据流中的每段闭合模式,然后逐层往上合并更新,从而在根节点可得整个分布式数据流的频繁闭合模式。
The set of frequent closed patterns uniquely determines the complete set of all frequent patterns, and it can be orders of magnitude smaller than the latter. According to the features of distributed data streams, a new algorithm is proposed for mining the frequent closed patterns. This algorithm uses K-children tree structure, receives each data stream by the leaf node, founds the DSFCl_tree to store each section of closed patterns in each data stream, then the cascade merges upward and renews, thus may result in the root node the frequent closed pattern in the entire distributional data streams. The experiments and analysis show that the algorithm has good performance.
出处
《微电子学与计算机》
CSCD
北大核心
2007年第9期120-122,125,共4页
Microelectronics & Computer
基金
安徽省高等学校自然科学研究项目(KJ2007B236)
关键词
数据挖掘
分布式数据流
关联规则
频繁闭合项集
data mining
distributed data streams
association rule
frequent closed itemsets