一种高效挖掘高维数据的频繁闭合模式算法被引量：1

Efficient algorithm for frequent closed patterns mining from high dimensional data

下载PDF

导出

摘要为了克服传统高维数据挖掘频繁闭合模式算法迭代产生子表,引起算法执行时间长和存储开销大等问题,提出了一种高效挖掘高维数据的频繁闭合模式的算法EMHCP.EMHCP算法采用一种新型结构位图表来压缩存储数据,在仅扫描数据库一次后,建立位图转换表.根据位图转换表来构建混合树结构,采用深度优先的方式和有效的剪枝策略高效挖掘出所有的闭合模式.从而有效地缩小了搜索空间,加快了处理速度.通过在生物数据库应用的实验结果表明,EMH-CP算法比已有的CARPENTER和TD-close等算法更为有效. The traditional algorithms for mining frequent closed patterns from high dimensional data interactively generate conditional tables, which costs much runtime and memory space. To solve these problems, a new algorithm-EMHCP （efficient mining of frequent closed patterns from high dimensional data） is proposed. The EMHCP algorithm adopts a novel structure, a bit map table, to compress the store data. With the table, a compound tree is constructed after scanning the database only once. By searching with the depth preferentially and using efficient pruning strategies, EMHCP can mine all frequent closed patterns efficiently. Therefore, the search space is reduced, and the mining speed is accelerated. The experiments on real bioinformatics datasets show that EMHCP is more efficient than previous algorithms such as CARPENTER and TD-close.

作者胡孔法唐小丽达庆利陈崚

机构地区东南大学经济管理学院扬州大学计算机科学与工程系

出处《东南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2007年第4期569-573,共5页 Journal of Southeast University：Natural Science Edition

基金国家自然科学基金资助项目(70472033 60473012) 国家科技基础条件平台建设资助项目(2004DKA20310) 江苏省自然科学基金资助项目(BK2005047 BK2005046) 江苏省高校"青蓝工程"基金资助项目

关键词数据挖掘频繁闭合模式行枚举混合树 data mining frequent closed patterns row enumeration compound tree

分类号 N945 [自然科学总论—系统科学] TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Pasquier N,Bastide Y.Discovering frequent closed itemsets for association rules[C]//Proceedings of the 7th Int'l Conf on Database Theory.Jerusalem:Springer-Verlag,1999:398-416.
2Pei J,Han J,Mao R.CLOSET:an efficient algorithm for mining frequent closed itemsets[C]//Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery.New York:ACM Press,2000:11-20.
3Wang J,Han J,Pei J.Closet+:searching for the best strategies for mining frequent closed itemsets[C]//Proc 2003 ACM SIGKDD.New York:ACM Press,2003:236-245.
4Zaki M,Hsiao C.ChARM:an efficient algorithm for closed association rule mining[C]//Proc of 2002 SIAM Data Mining Conf.Arlington,VA,2002:457473.
5Pan F,Cong G,Zaki M.CARPENTER:finding closed patterns in long biological datasets[C]//Proc ACM SIGKDD 2003.New York:ACM Press,2003:637-642.
6Cong G,Tung A,Xu X.FARMER:finding interesting rule groups in microarray datasets[C]//Proc 23rd ACM Int Conf Management of Data.New York:ACM Press,2004:143-154.
7Liu H,Han J,Xin D,et al.Mining interesting patterns from very high dimensional data:a top-down row enumeration approach[C/OL]//Proc of the 6th SIAM International Conference on Data Mining.Bethesda,MD,2006.http://www.siam.org/meetings/sdmob/proceedings/026liuh.pdf.
8Liu H,Han J,Xin D,et al.Top-down mining of interesting patterns from very high dimensional data[C]//Proc 22nd International Conference on Data Engineering.Los Alamitos:IEEE Computer Society Press,2006:114-116.
9Creighton C,Hanash S.Mining gene expression databases for association rules[J].Bioinformatics,2003,19(1):79-86.

同被引文献15

1Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB' 94), Sanfiaogo di Chile,Chile, 1994. 487-499
2Yah X, Han J. gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan, 2002. 721-724
3Koyuturk M, Grama A, Szpankowski W. An eficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 2004,20(1) : i200-i207
4Hu H Y, Yan X F, Huang Y, et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 2005,21(1) : i213-i221
5Olken F. Biopathways and protein interaction databases. In: A lecture in Bioinfonnatics Tools for Comparative Genomics: A short course, Berkeley, CA, 2003
6Hart J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACMSIG- MOD International Conference on Management of Data, Dallas, TX,USA, 2000. New York: ACM Press, 2000. 1-12
7Krishnamurthy L, Nadeau J, Ozsoyoglu G, et al. Pathways database system: an integrated system for biological pathways. Bioinformatics, 2003,19(8) : 930-937
8Han J W, Kamber M. Data Mining Concepts and Tech- niques. 2nd Edition. Singapore: Elsevier (Singapore) Pte Ltd, 2007. 233-249
9Altschul S F, Madden T L, Scheffer A A, et al. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997, 25 ( 17 ) : 3389- 34O2
10Thompson J D, Higgins D G, Gibson T J. CLUSTALW: im- proving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994,22 (22), 4673-4680

引证文献1

1彭佳扬,杨路明,王建新,刘振,李敏.一种高效挖掘生物网络闭合频繁子图的算法[J].高技术通讯,2009,19(2):188-193. 被引量：1

二级引证文献1

1陆慧琳,黄博.基于双索引的子图查询算法[J].计算机工程,2015,41(1):44-48. 被引量：2

1寇晨艳.一种基于排序的基因表达数据频繁闭合模式挖掘算法[J].电脑与信息技术,2014,22(3):7-10.
2张秀艳,吴丹,顾婉莹.一种基于混合树防碰撞算法的改进算法[J].计算机应用与软件,2017,34(2):295-298. 被引量：7
3伏玉琛,郭薇,周洞汝.空间索引的混合树结构研究[J].计算机工程与应用,2003,39(17):41-42. 被引量：12
4江雨,马满福.物联网中RFID位匹配防碰撞算法[J].计算机应用研究,2012,29(1):88-91. 被引量：15
5程转流,胡为成,胡学钢.基于DSFCI-tree的分布式数据流频繁闭合模式挖掘[J].微电子学与计算机,2007,24(9):120-122. 被引量：2
6宋晓宇,周新伟,王永会.三维GIS中混合树空间索引结构的研究[J].沈阳建筑大学学报（自然科学版）,2006,22(3):478-482. 被引量：3
7王克朝,王甜甜,苏小红,马培军.基于频繁闭合序列模式挖掘的学生程序雷同检测[J].吉林大学学报（工学版）,2015,45(4):1260-1265. 被引量：1
8廖年旺.如何将位图转换为矢量图[J].电脑界（应用文萃）,2001(7):55-55.
9爱如风.在线生成矢量图[J].软件指南,2008(11):78-78.
10清风.Flash简单实用的技巧[J].电脑知识与技术（过刊）,2002(5):64-65.

东南大学学报（自然科学版）

2007年第4期

浏览历史

内容加载中请稍等...

一种高效挖掘高维数据的频繁闭合模式算法被引量：1

参考文献9

同被引文献15

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种高效挖掘高维数据的频繁闭合模式算法 被引量：1

参考文献9

同被引文献15

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种高效挖掘高维数据的频繁闭合模式算法被引量：1