基因表达数据的频繁闭合模式挖掘新算法被引量：1

A new algorithm for mining frequent closed patterns in gene expression datasets

下载PDF

导出

摘要基因表达数据集与传统事务数据集相比呈现出新的特征,由于其项目数远远大于事务数,使得大量现有的基于项目枚举的频繁闭合模式挖掘算法不再适用.为此提出一种频繁闭合模式挖掘新算法TPclose,使用TP-树(tidset-prefix tree)保存项目的事务集信息.该算法将频繁闭合模式挖掘问题转换成频繁闭合事务集挖掘问题,采取自顶向下分而治之的事务搜索策略,并组合了高效的修剪技术和有效的优化技术.实验表明,TPclose算法普遍快于自底向上事务搜索算法RERⅡ,最高达2个数量级以上. Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and a few transactions. While there are large numbers of algorithms developed for frequent closed patterns mining, their running time increased exponentially with increasing average length of the transactions, thus such gene expression datasets render most current algorithms impractical. TPclose, a new efficient algorithm for mining frequent closed patterns from gene expression datasets was proposed. It stored the tidset of each item using a TP tree （tidset-prefix tree）. TPclose converted the problem of mining frequent closed patterns into one of mining frequent closed tidsets, adopting the top-down and divide-and-conquer search strategy to explore transaction enumeration search space and combining efficient pruning and effective optimizing. Several experiments on real-life gene expression datasets show that TPclose outperforms RER Ⅱ , an existing algorithm based on bottom-up search strategy, by up to two orders of magnitude.

作者缪裕青陈国良徐云

机构地区中国科学技术大学计算机科学与技术系

出处《中国科学技术大学学报》 CAS CSCD 北大核心 2007年第9期1080-1087,共8页 JUSTC

基金国家自然科学基金重点项目(60533020)资助

关键词数据挖掘关联规则频繁闭合模式基因表达数据自顶向下 data mining association rules frequent closed pattern gene expression data top-down

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Madeira S C, Oliveira A L. Biclustering algorithm for biological data analysis:a survey [J]. ACM Transactions on Computational Biology and Bioinformatics, 2004, 1 (1):24-45.
2Creighton C, Hanash S. Mining gene expression databases for association rules [J]. Bioinformatics, 2003, 19(1):79-86.
3Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation[C].Proc, of 19th ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000:1-12.
4Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules [C].Proc. of the 7th Int'l Conf. on Database Theory. Jerusalem. Springer-Verlag, 1999:398-416.
5Zaki M J, Hsiao C J. CHARM: An efficient algorithm for closed iternset mining[C].Proe, of the 2nd SIAM Int'l Conf. on Data Mining. Arlington, 2002:12-28.
6刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法[J].软件学报,2004,15(1):94-102. 被引量：19
7Pan F, Cong G, Tung A, et al. CARPENTER:finding closed patterns in long biological datasets[C].SIGKDD'03. Washington: ACM Press, 2003:637-642.
8Cong G, Tan K L, Tung A, et al. Mining frequent closed patterns in microarray data[C].Proc, of the 4th IEEE Int'l Conf. on Data Mining. 2004, 4: 363-366.
9Valtchev P, Missaoui R, Godin R. Formal concept analysis for knowledge discovery and data mining: the new challenges [C].Proc. of ICFCA'04. 2004: 352-371.
10Supplemental data for discovery, and prediction lyraphoblastic leukemia by [ EB/OL ]. http://www. ALL1/all datafiles, html. classification, subtype of outcome in pediatric gene expression profiling stjuderesearch, org/data/.

二级参考文献8

1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398～416.
2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487～499.
3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21～30.
4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443～452.
5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12～28.
6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229～238.
7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/～mlearn/MLRepository.html

共引文献18

1张莹,韩芳溪,柴乔林.基于频繁模式树的AOI聚类算法[J].计算机工程与应用,2004,40(35):178-179.
2刘学军,徐宏炳,董逸生,钱江波,王永利.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研究与发展,2006,43(10):1738-1743. 被引量：26
3杨萍,李立乡,杨明.快速更新频繁闭合项目集算法[J].计算机工程与应用,2006,42(36):148-151. 被引量：1
4刘旭,毛国君,孙岳,刘椿年.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5):900-905. 被引量：14
5程转流,胡为成,胡学钢.基于DSFCI-tree的分布式数据流频繁闭合模式挖掘[J].微电子学与计算机,2007,24(9):120-122. 被引量：2
6宋威,杨炳儒,徐章艳,张桃红.基于索引数组和复合频繁模式树的频繁闭项集挖掘算法[J].计算机科学,2007,34(8):165-167. 被引量：1
7郭宇红,童云海,唐世渭,杨冬青.基于FP-Tree的反向频繁项集挖掘[J].软件学报,2008,19(2):338-350. 被引量：20
8缪裕青,金波,陈国良.HTCLOSE：快速挖掘微阵列数据集中的频繁闭合模式[J].小型微型计算机系统,2008,29(2):274-278.
9程转流,胡学钢.数据流中频繁闭合模式的挖掘[J].计算机工程,2008,34(16):50-52. 被引量：4
10董杰,韩敏.挖掘事务间频繁闭项集的高效率算法[J].控制与决策,2008,23(9):994-998. 被引量：3

同被引文献4

1金波,缪裕青.MFCPLG:微阵列数据中频繁闭合模式挖掘[J].计算机工程,2007,33(16):50-52. 被引量：1
2F. Pan, G. Cong, A.K.H. Tung, et al. CARPENTER:Finding Closed Patterns in Long Biological Datasets[J]. SIGKDD'03, ACM Press, Washington,D.C., USA, 2003, 637-642.
3http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi.
4寇晨艳,郭红.基因表达数据的频繁模式挖掘算法[J].福州大学学报（自然科学版）,2009,37(2):194-198. 被引量：1

引证文献1

1寇晨艳.一种基于排序的基因表达数据频繁闭合模式挖掘算法[J].电脑与信息技术,2014,22(3):7-10.

1寇晨艳.一种基于排序的基因表达数据频繁闭合模式挖掘算法[J].电脑与信息技术,2014,22(3):7-10.
2缪裕青,金波,陈国良.HTCLOSE：快速挖掘微阵列数据集中的频繁闭合模式[J].小型微型计算机系统,2008,29(2):274-278.
3王克朝,王甜甜,苏小红,马培军.基于频繁闭合序列模式挖掘的学生程序雷同检测[J].吉林大学学报（工学版）,2015,45(4):1260-1265. 被引量：1
4马莉,樊友平,钟勇,杨文茵.修剪技术与参数调整的动态模糊神经网络设计[J].系统仿真学报,2010,22(7):1646-1650. 被引量：1
5朱光喜,吴伟民,阮幼林,刘干.一种基于前缀树的频繁模式挖掘算法[J].计算机科学,2005,32(4):34-36. 被引量：4
6缪裕青,尹东.分布式存储结构的频繁闭合模式挖掘并行算法[J].微电子学与计算机,2007,24(10):161-163. 被引量：3
7郭炜明.中望CAD应用创新之修剪技术[J].CAD/CAM与制造业信息化,2007(11):34-35.
8杨君锐,张敏,何洪德.基于分布式的频繁闭合模式挖掘算法[J].西南交通大学学报,2012,47(6):1027-1033.
9王亮,汪梅,郭鑫颖,秦学斌.面向移动时空轨迹数据的频繁闭合模式挖掘[J].西安科技大学学报,2016,36(4):573-576. 被引量：5
10薛锦,陈原斌.一种实用的关联规则增量式更新算法[J].计算机工程与应用,2003,39(13):212-213. 被引量：2

中国科学技术大学学报

2007年第9期

浏览历史

内容加载中请稍等...

基因表达数据的频繁闭合模式挖掘新算法被引量：1

参考文献11

二级参考文献8

共引文献18

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基因表达数据的频繁闭合模式挖掘新算法 被引量：1

参考文献11

二级参考文献8

共引文献18

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基因表达数据的频繁闭合模式挖掘新算法被引量：1