摘要
为了克服传统高维数据挖掘频繁闭合模式算法迭代产生子表,引起算法执行时间长和存储开销大等问题,提出了一种高效挖掘高维数据的频繁闭合模式的算法EMHCP.EMHCP算法采用一种新型结构位图表来压缩存储数据,在仅扫描数据库一次后,建立位图转换表.根据位图转换表来构建混合树结构,采用深度优先的方式和有效的剪枝策略高效挖掘出所有的闭合模式.从而有效地缩小了搜索空间,加快了处理速度.通过在生物数据库应用的实验结果表明,EMH-CP算法比已有的CARPENTER和TD-close等算法更为有效.
The traditional algorithms for mining frequent closed patterns from high dimensional data interactively generate conditional tables, which costs much runtime and memory space. To solve these problems, a new algorithm-EMHCP (efficient mining of frequent closed patterns from high dimensional data) is proposed. The EMHCP algorithm adopts a novel structure, a bit map table, to compress the store data. With the table, a compound tree is constructed after scanning the database only once. By searching with the depth preferentially and using efficient pruning strategies, EMHCP can mine all frequent closed patterns efficiently. Therefore, the search space is reduced, and the mining speed is accelerated. The experiments on real bioinformatics datasets show that EMHCP is more efficient than previous algorithms such as CARPENTER and TD-close.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2007年第4期569-573,共5页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(70472033
60473012)
国家科技基础条件平台建设资助项目(2004DKA20310)
江苏省自然科学基金资助项目(BK2005047
BK2005046)
江苏省高校"青蓝工程"基金资助项目
关键词
数据挖掘
频繁闭合模式
行枚举
混合树
data mining
frequent closed patterns
row enumeration
compound tree