期刊文献+

基因表达数据的频繁闭合模式挖掘新算法 被引量:1

A new algorithm for mining frequent closed patterns in gene expression datasets
下载PDF
导出
摘要 基因表达数据集与传统事务数据集相比呈现出新的特征,由于其项目数远远大于事务数,使得大量现有的基于项目枚举的频繁闭合模式挖掘算法不再适用.为此提出一种频繁闭合模式挖掘新算法TPclose,使用TP-树(tidset-prefix tree)保存项目的事务集信息.该算法将频繁闭合模式挖掘问题转换成频繁闭合事务集挖掘问题,采取自顶向下分而治之的事务搜索策略,并组合了高效的修剪技术和有效的优化技术.实验表明,TPclose算法普遍快于自底向上事务搜索算法RERⅡ,最高达2个数量级以上. Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and a few transactions. While there are large numbers of algorithms developed for frequent closed patterns mining, their running time increased exponentially with increasing average length of the transactions, thus such gene expression datasets render most current algorithms impractical. TPclose, a new efficient algorithm for mining frequent closed patterns from gene expression datasets was proposed. It stored the tidset of each item using a TP tree (tidset-prefix tree). TPclose converted the problem of mining frequent closed patterns into one of mining frequent closed tidsets, adopting the top-down and divide-and-conquer search strategy to explore transaction enumeration search space and combining efficient pruning and effective optimizing. Several experiments on real-life gene expression datasets show that TPclose outperforms RER Ⅱ , an existing algorithm based on bottom-up search strategy, by up to two orders of magnitude.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2007年第9期1080-1087,共8页 JUSTC
基金 国家自然科学基金重点项目(60533020)资助
关键词 数据挖掘 关联规则 频繁闭合模式 基因表达数据 自顶向下 data mining association rules frequent closed pattern gene expression data top-down
  • 相关文献

参考文献11

  • 1Madeira S C, Oliveira A L. Biclustering algorithm for biological data analysis:a survey [J]. ACM Transactions on Computational Biology and Bioinformatics, 2004, 1 (1):24-45.
  • 2Creighton C, Hanash S. Mining gene expression databases for association rules [J]. Bioinformatics, 2003, 19(1):79-86.
  • 3Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation[C].Proc, of 19th ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000:1-12.
  • 4Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules [C].Proc. of the 7th Int'l Conf. on Database Theory. Jerusalem. Springer-Verlag, 1999:398-416.
  • 5Zaki M J, Hsiao C J. CHARM: An efficient algorithm for closed iternset mining[C].Proe, of the 2nd SIAM Int'l Conf. on Data Mining. Arlington, 2002:12-28.
  • 6刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法[J].软件学报,2004,15(1):94-102. 被引量:19
  • 7Pan F, Cong G, Tung A, et al. CARPENTER:finding closed patterns in long biological datasets[C].SIGKDD'03. Washington: ACM Press, 2003:637-642.
  • 8Cong G, Tan K L, Tung A, et al. Mining frequent closed patterns in microarray data[C].Proc, of the 4th IEEE Int'l Conf. on Data Mining. 2004, 4: 363-366.
  • 9Valtchev P, Missaoui R, Godin R. Formal concept analysis for knowledge discovery and data mining: the new challenges [C].Proc. of ICFCA'04. 2004: 352-371.
  • 10Supplemental data for discovery, and prediction lyraphoblastic leukemia by [ EB/OL ]. http://www. ALL1/all datafiles, html. classification, subtype of outcome in pediatric gene expression profiling stjuderesearch, org/data/.

二级参考文献8

  • 1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398~416.
  • 2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487~499.
  • 3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21~30.
  • 4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443~452.
  • 5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12~28.
  • 6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229~238.
  • 7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
  • 8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html

共引文献18

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部