基于闭合模式的高维基因表达谱多类分类被引量：1

Multi-class classification of high-dimension gene expression profile based on closed patterns

下载PDF

导出

摘要针对多类高维基因表达谱的特点,提出一种基于闭合模式的多类分类算法CBCP,即根据垂直格式的数据集采用路径枚举的方法挖掘闭合模式,极大地减少了冗余模式的产生。然后,对所有闭合模式进行排序,通过覆盖训练集建立分类器。针对分类器无法识别的样本提出权重算法进行判断,克服了使用Default类预测不精确的问题。研究结果表明,CBCP与经典分类算法如CBA和C4.5相比具有更高的预测准确率,并且在基因数大幅增加而样本数不变的情况下仍具有较强的稳定性,证明CBCP的可扩展性强,适用于高维数据集的多类分类预测。 According to the characteristics of multi-class high-dimension gene expression profile, a new multi-class classification algorithm（CBCP） based on closed pattern was designed. Firstly an approach called path enumeration was proposed to mine closed patterns based on the vertical formatted data-table, which can reduce most redundant patterns. Then closed patterns were sorted and used to cover train dataset for building the classifier. The unrecognized samples were classified by weight algorithm, which can overcome the inaccuracy caused by using Default Class. The results show that the algorithm is proved to be more accurate than classical classification algorithms such as CBA and C4.5. CBCP keeps high accuracy when the number of genes increases substantially with the increase of number of samples fixed, which proves it is suitable for multi-class classification of high dimension datasets, and it is easy to extend.

作者李宏李翔吴敏陈松乔易丽君

机构地区中南大学信息科学与工程学院

出处《中南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2008年第5期1035-1041,共7页 Journal of Central South University:Science and Technology

基金国家杰出青年科学基金资助项目(60425310) 中南大学博士后基金资助项目(2008)

关键词关联规则闭合模式多类别权重算法 association rules closed pattern multi-class weight algorithm

分类号 TP274 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献15

1HAN Jia-wei, Kamber M. Data mining: Concepts and techniques[M]. Beijing: Higher Education Press, 2001: 10-20.
2Doug B, Johannece G, Manuel M. MAFIA: A maximal frequent itemset algorithm for transactional databases[C]//Proceedings of the 17th International Conference on Data Engineering. German: Heidelbergt, 2001: 443-452.
3Bastide Y, Pasquier N, Taouil R. Discovering frequent closed itemsets for association rules[C]//Proceedings, of the 7th International Conferenece on Database Theory. Jerusalem: Springer-Verlag, 1999: 398-416.
4Bing L, Wayne S, Yiming M. Integrating classification and association rule mining[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998: 80-86.
5LI Wen-min, HAN Jia-wei, PEI Jian. CMAR: Accurate and efficient classification based on multiple class association rules[C]//Proceedings of IEEE International Conference on Data Mining. San Jose: CA, 2001: 369-376.
6李宏,杜剑峰,陈松乔.分布式数据库约束性关联规则挖掘[J].中南大学学报（自然科学版）,2004,35(6):998-1003. 被引量：1
7邹晓峰,陆建江,宋自林.基于模糊分类关联规则的分类系统[J].计算机研究与发展,2003,40(5):651-656. 被引量：19
8Thabtah H, Cowling P, Yonghong P. MMAC: A new multi-class, multi-label associative classification approach//Proceedings of IEEE International Conference on Data Mining. Brighton, 2004: 217-224.
9Lim T, Weiyin L. A comparison of prediction accuracy, complexity and training time of thirty-three old and new classification algorithms[J]. Machine Learning, 2000, 40: 203-228.
10Quinlan J. C4.5: Programs for machine learning[M]. San Francisco: Morgan Kaufmann, 1993: 56-89.

二级参考文献26

1B Lent, A Swami, J Widom. Clustering association rules. In:AlexGray, Per-Ake Larson eds. Proc of the 13th Int'l Conf on Data Engineering. Birmingham, England: IEEE Computer Society, 1997. 220-231.
2B Liu, W Hsu, Y Ma. Integrating classification and association rule mining. In: R Agrawal, P E Stolorz, G Piatetsky-Shapiro eds. Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 80-86.
3G Dong, J Li. Efficient mining of emerging patterns: Discovering trends and differences. In, S Chaudhuri, D Modigan eds. Proc of the 5th Int' 1 Conf on Knowledge Discovery and Data Mining. San Diego, CA: ACM Press, 1999. 43-52.
4J Li, G Dong, K Rmxmmohtmarao. Making use of the most expressive jumping emerging patterns for classification. In: Takao Terano, Huan Liu, Arbee L P Chen eds. Proc of the 4th Pacific-Asia Conf on Knowledge Discovery mad Data Mining. Kyoto,Japan: Springer, 2000. 220-232.
5Chan Man Kuok, Aria Fu, Man Hon Wong. Mining fuzzy association rules in database. SIGMOD Record, 1998, 27(1): 41-46.
6R J Hathaway, J W Davenport, J C Bezdek. Relational dual of the c-means algorithms. Pattern Recognition, 1989, 22 (2) : 205-212.
7R Agrawal, R Srikant. Fast algorithms for mining association rules. In: J B Bocca, M Jarke, C Zaniolo eds. Proc of the 20th Int'l Conf on Very Large Databases. Santiago, Chile: Morgan Kaufmann, 1994. 487-499.
8L I Kuncheva. How good are fuzzy if-then classifiers? IEEE Trans on Systems, Man, and Cybernetics, Part B: Cybernetics, 2000,30(4):501-509.
9O Cordon, F Herrera. A three-stage evolutionary process for learning descriptive and approximative fuzzy logic controller knowledge bases from examples.Imernational Journal of Approximate Reasoning, 1997, 17(4): 369-407.
10Z Michalewicz. Genetic Algorithms + Data Structure = Evolution Programs. New York: Springer-Verlag, 1994.

共引文献18

1李清峰,杨路明,张晓峰.关联规则中最大频繁项目集的研究[J].计算机应用研究,2005,22(1):93-95. 被引量：3
2莫登奎,林辉,孙华,熊育久,刘秀英.基于高分辨率遥感影像的土地覆盖信息提取[J].遥感技术与应用,2005,20(4):411-414. 被引量：30
3马光志,张生庭.基于关联规则的Web文档分类[J].计算机工程与设计,2005,26(9):2515-2518. 被引量：8
4朱玉全,宋余庆,杨鹤标,陈健美.基于频繁模式树的关联分类规则挖掘算法[J].江苏大学学报（自然科学版）,2006,27(3):262-265. 被引量：2
5李清峰.数据挖掘与数量经济学[J].湖南商学院学报,2008,15(2):20-22.
6牛成林,刘吉臻,李建强,刘向杰,郝祖龙.基于模糊划分的数据挖掘算法在电厂燃烧优化系统中的应用[J].计算机与应用化学,2008,25(7):902-906. 被引量：4
7曹君,郑慧.ADS40近红外波段遥感影像分类方法[J].地理空间信息,2009,7(5):42-45. 被引量：1
8陈云亮,李欣,杨捷,谢长生.用于关联规则挖掘的一种基于小生境技术的GEP算法[J].计算机科学,2009,36(11):224-227.
9牛成林,刘吉臻,马永光,李建强.基于增量数据挖掘的氧量最优值确定[J].中国电机工程学报,2009,29(35):29-34. 被引量：9
10许昌林,魏立力.基于Vague集相似度量的模糊分类[J].计算机工程与应用,2010,46(36):161-164. 被引量：1

同被引文献15

1Singh D, Febbo P G, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior[J]. Cancer Cell, 2002, 1(2): 203-209.
2Dash S, Patra B. A study on gene selection and classification algorithms for classification of gene expression profile[J]. International Journal of Research and Reviews in Computer Science, 2011, 2(5): 1212-1217.
3LI Bo, ZHENG Churthou, HYANG Deshuang, et al. Gene expression data classification using locally linear discriminant embedding[J]. Computers in Biology and Medicine, 2010, 40(10): 802-810.
4Kancherla K, Mukkamala S. Feature selection for lung cancer detection using SVM based recursive feature elimination method[J]. Machine Learning and Data Mining in Bioinformatics, 2012, 7246: 168-176.
5Tari L, Baral C, Kim S. Fuzzy c-means clustering with prior biological knowledge[J]. Journal of Biomedical Informafics, 2009, 42(1): 74-81.
6Patterson A D, LI Henghong, Eichler G S, et al. UPLC-ESI-TOFMS-based metabolomics and gene expression dynamics inspector self-organizing metabolomic maps as tools for understanding the cellular response to ionizing mdiatinn[J]. American Chemical Society, 2008, 80(3): 665-674.
7ZHU Xiaojin. Semi-supervised learning with graphs[D]. Pennsylvania: Carnegie Mellon University. School of Computer Science, 2005: 5-8.
8WANG Fei, ZHANG Changshui. Label propagation through linear neighborhoods[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-67.
9WANG Jingdong, WANG Fei, ZHANG Changshui, et al. Linear neighborhood propagation and its applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(9): 1600-1615.
10BAI Xiang, YANG Xingwei, Latecki L J, et al. Learning context-sensitive shape similarity by graph transduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 861-874.

引证文献1

1王年,葛芳,王俊生,唐俊.基于改进标记传播算法的基因表达谱数据分析[J].中南大学学报（自然科学版）,2014,45(7):2237-2243.

1周明,李宏.长生物数据集中频繁闭合模式挖掘算法研究[J].计算机工程,2007,33(2):74-76. 被引量：1
2王新宇,唐世渭.结合项约束的闭合模式挖掘研究[J].计算机科学,2004,31(9):157-160. 被引量：1
3王虎,丁世飞.序列模式挖掘研究与发展[J].计算机科学,2009,36(12):14-17. 被引量：33
4王淼,尚学群,薛贺.基于相邻频繁模式段的闭合序列模式挖掘算法[J].计算机工程与应用,2008,44(11):148-151.
5贺亮,王科人,韩杰思.序列模式挖掘算法综述[J].电信技术研究,2015,0(2):45-56.
6王达宗,马增良.SCADA系统冗余模式下数据同步的实现模型[J].微计算机信息,2005,21(4):76-77. 被引量：6
7王达宗,马增良.冗余SCADA数据同步的设计与构建[J].计算机应用,2005,25(5):1225-1226. 被引量：6
8韩萌,王志海,原继东.一种基于时间衰减模型的数据流闭合模式挖掘方法[J].计算机学报,2015,38(7):1473-1483. 被引量：15
9李宏,陈松乔,易丽君,周明,李翔.基于闭合模式的高维生物数据分类算法研究[J].小型微型计算机系统,2007,28(8):1423-1426. 被引量：1
10杜翠兰,鲁睿,付戈,赵淳璐,钮艳.用闭合序列模式实现特征子串的发现研究[J].现代计算机,2015,21(12):20-22.

中南大学学报（自然科学版）

2008年第5期

浏览历史

内容加载中请稍等...

基于闭合模式的高维基因表达谱多类分类被引量：1

参考文献15

二级参考文献26

共引文献18

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于闭合模式的高维基因表达谱多类分类 被引量：1

参考文献15

二级参考文献26

共引文献18

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于闭合模式的高维基因表达谱多类分类被引量：1