基于闭合模式的高维生物数据分类算法研究被引量：1

Research of Classification with Closed Patterns Mining in Long Biological Datasets

下载PDF

导出

摘要针对基因表达谱数据的特点提出了基于闭合模式的FEALL分类算法.首先对数据进行预处理,剔除表达谱中的无关基因,从而降低FEALL算法的时间复杂度,减少冗余关联规则的产生;然后根据FEALL算法对行集建立行FP-tree,并对每行建立路径枚举树,挖掘出兴趣规则组的上边界,基于上边界建立分类器对样本进行分类预测,无法识别的样本采用权重判断算法进行判断.实验证明FEALL算法具有较高的效率和预测准确率. This paper proposed an algorithm, FEALL, based on closed pattern. We eliminate the irrelevant genes from gene expression dataset before mining of association rules, then according to FEALL we take row enumeration, build row FP-tree and use upper bounder of Interesting Rule Group to establish classifier. The unrecognizable samples are classified by weight-algorithm. FEALL is proved to be correct and efficient by experiments.

作者李宏陈松乔易丽君周明李翔

机构地区中南大学信息科学与工程学院

出处《小型微型计算机系统》 CSCD 北大核心 2007年第8期1423-1426,共4页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(60433020)资助

关键词关联规则规则组闭合模式上边界 association rules rule group closed pattern upper bounder

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset algorithm for transactional databases[C].In:Intl.Conf.on Data Engineering,April 2001.
2Pasquier N,Bastide Y,Taouil R,et al.Discovering frequent closed itemsets for association rules[C].In:Beeri C,et al,eds.Proc.of the 7th Int'1.Conf.on Database Theory Jerusalem:Springer-Verlag,1999,398-416.
3Golub T R,Slonim D K,Tamayo P,et al.Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J].Science,1999,286 (5439):531-537.
4Ramaswamy S,Golub T R.DNA microarrays in clinical oncology[J].Journal of Clinical Oncology,2002,20(7):1932-1941.
5Khan J,Wei J S,Ringner M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[J].Nat Med,2001,7(6):673-679.
6Gao Cong,Jiong Yang,Mohammed J.Zaki.Carpenter:finding closed patterns in long biological datasets[C].SIGKDD '03,August 2427,2003,Washington,DC,USA..
7Pei J,Han J,Mao R.CLOSET:an efficient algorithm for mining frequent closed itemsets[Z].Workshop on Data Mining and Knowledge Discovery.Dallas:ACM Press,2000,21-30.
8Zaki M J,Hsiao C J.CHARM:An efficient algorithm for closed itemset mining[C].Proc.of the 2nd SIAM Int'l.Conf.on Data Mining.Arlington:SIAM,2002,12-28.
9Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C].In:Proc.2000 ACM-SIGMOD Int.Conf.Management of Data(SIGMOD'00),1-12,Dallas,TX,May 2000.
10李颖新,阮晓钢.基于支持向量机的肿瘤分类特征基因选取[J].计算机研究与发展,2005,42(10):1796-1801. 被引量：51

二级参考文献12

1T.R. Golub, D. K. Slonim, P. Tamayo, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531 ～ 537.
2J. Khan, J. S. Wei, M. Ringner, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6): 673 ～679.
3I. Guyon, J. Weston, S. Barnhill, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 2000, 46(13): 389～ 422.
4R. Tibshirani, T. Hastie, B. Narasimhan, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression.PNAS, 2002, 99(10): 6567～6572.
5S. Theodoridis, K. Koutroumbas. Pattern Recognition (2nd edition). New York: Academic Press, 2003. 177～179.
6V.N. Vapnik. Statistical Learning Theroy. New York: Wiley Interscience, 1998.
7M. Dash, H. Liu. Feature selection for classification. Intelligent Data Analysis, 1997, 1(3): 131～156.
8B. José, A. D. Bruce. Feature selection from huge feature sets.In: Proc. 8th Int'l Conf. Computer Vision ( ICCV' 01 ) . Los Alamitos: IEEE Computer Society Press, 2001. 159～ 165.
9E.S. Lander. Array of hope. Nature Genetics, 1999, 21(Suppl): 3～4.
10S. Ramaswamy, T. R. Golub. DNA microarrays in clinical oncology. Journal of Clinical Oncology, 2002, 20 ( 7 ): 1932 ～1941.

共引文献50

1王树林,王戟,陈火旺,张波云.基于主成份分析的肿瘤分类检测算法研究[J].计算机工程与科学,2007,29(9):84-90. 被引量：9
2郭雪松,孙林岩,徐晟.基于P-SVM的绿色供应商评价模型[J].预测,2007,26(5):7-11. 被引量：10
3刘全金,李颖新,阮晓钢.基于SVM的灵敏度分析方法选取肿瘤特征基因[J].北京工业大学学报,2007,33(9):954-958. 被引量：4
4周昉,何洁月.生物信息学中基因芯片的特征选择技术综述[J].计算机科学,2007,34(12):143-150. 被引量：20
5王树林,王戟,陈火旺,李树涛,张波云.肿瘤信息基因启发式宽度优先搜索算法研究[J].计算机学报,2008,31(4):636-649. 被引量：17
6张娅,饶妮妮,王敏,徐尚蕾.一种基于基因表达谱的结肠癌特征提取方法[J].航天医学与医学工程,2008,21(4):356-360. 被引量：7
7曾志强,吴群,廖备水,朱顺痣.改进工作集选择策略的序贯最小优化算法[J].计算机研究与发展,2009,46(11):1925-1933. 被引量：5
8罗美淑,刘世勇,石磊,于化龙.一种基于微阵列数据的集成分类方法[J].计算机应用研究,2010,27(1):104-106. 被引量：2
9于化龙,顾国昌,刘海波,沈晶,赵靖.基于相关性分析的微阵列数据集成分类研究[J].计算机研究与发展,2010,47(2):328-335. 被引量：5
10宋沂鹏,孔薇,夏斌.基于ICA的AD样本的相关基因研究[J].电子设计工程,2010,18(9):4-7.

同被引文献13

1王鹏,吴晓晨,王晨,汪卫,施伯乐.CAPE——数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,41(10):1677-1683. 被引量：7
2Wang J, Karypis G. HARMONY: Efficiently mining the best rules for classification [C] //Proc of 2005 SIAM Conf of Data Mining (SDM'05). 2005: 205-216
3Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C] //Proc of KDD'98. 1998:80-86
4Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules [C] //Proc of ICDM'01. Berlin: Springer, 2001:369-376
5Gosta G, Jianfei Z. Efficiently Using prefix-trees in mining frequent itemsets [C] //Proc of FIMI'04. Piscataway, NJ: IEEE, 2003
6Chi Y, Wang H, Yu P S, et al. Moment: Maintaining closed frequent itemsets over a stream sliding window [C]//Proc of ICDM'04. Piscataway, NJ: IEEE, 2004:59-66
7Pei J, Han J, Wang J. Closet+: Searching for the best strategies for mining frequent closed itemsets [C]//Proc of SIGKDD '03. New York: ACM, 2003
8Burdiek D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases [C] //Proc of the 17tb Int Conf on Data Engineering. Piseataway, NJ: IEEE, 2001:443-452
9Coenen F. LUCS KDD implementation of CMAR [OL]. [2007-10-07J. http://www. esc. liv. ac. uk/-frans/KDD/ Software/CMAR/emar. html, The University of Liverpool
10Blake C L, Merz C J. UCI repository of machine learning databases [OL]. [2007-10-07]. http://www. ics. uci. edu/-mlearn/MLRepository.html

引证文献1

1敖富江,王涛,刘宝宏,黄柯棣.CBC-DS:基于频繁闭模式的数据流分类算法[J].计算机研究与发展,2009,46(5):779-786. 被引量：3

二级引证文献3

1马青霞,李广水,孙梅.频繁模式挖掘进展及典型应用[J].计算机工程与应用,2011,47(15):138-144. 被引量：6
2贾敏杰,王黎明.基于k-best树模式的树流分类算法研究[J].小型微型计算机系统,2013,34(6):1328-1333.
3丁剑,韩萌,李娟.概念漂移数据流挖掘算法综述[J].计算机科学,2016,43(12):24-29. 被引量：13

1王虎,丁世飞.序列模式挖掘研究与发展[J].计算机科学,2009,36(12):14-17. 被引量：33
2李宏,李翔,吴敏,陈松乔,易丽君.基于闭合模式的高维基因表达谱多类分类[J].中南大学学报（自然科学版）,2008,39(5):1035-1041. 被引量：1
3贺亮,王科人,韩杰思.序列模式挖掘算法综述[J].电信技术研究,2015,0(2):45-56.
4王新宇,唐世渭.结合项约束的闭合模式挖掘研究[J].计算机科学,2004,31(9):157-160. 被引量：1
5周明,李宏.长生物数据集中频繁闭合模式挖掘算法研究[J].计算机工程,2007,33(2):74-76. 被引量：1
6王淼,尚学群,薛贺.基于相邻频繁模式段的闭合序列模式挖掘算法[J].计算机工程与应用,2008,44(11):148-151.
7邱英汉.关于三段排序法的研究[J].佛山科学技术学院学报（社会科学版）,1993,11(6):38-43.
8杜翠兰,鲁睿,付戈,赵淳璐,钮艳.用闭合序列模式实现特征子串的发现研究[J].现代计算机,2015,21(12):20-22.
9庞淑英,付铁威,胡恒奎,郑晓建.挖掘用户兴趣的Web智能检索桌面的研究[J].成都理工大学学报（自然科学版）,2003,30(2):214-216. 被引量：1
10高晓东.MasterCAM中昆氏曲面(Coons)的应用技巧[J].模具制造,2008,8(5):25-27. 被引量：2

小型微型计算机系统

2007年第8期

浏览历史

内容加载中请稍等...

基于闭合模式的高维生物数据分类算法研究被引量：1

参考文献10

二级参考文献12

共引文献50

同被引文献13

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于闭合模式的高维生物数据分类算法研究 被引量：1

参考文献10

二级参考文献12

共引文献50

同被引文献13

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于闭合模式的高维生物数据分类算法研究被引量：1