期刊文献+

基于闭合模式的高维基因表达谱多类分类 被引量:1

Multi-class classification of high-dimension gene expression profile based on closed patterns
下载PDF
导出
摘要 针对多类高维基因表达谱的特点,提出一种基于闭合模式的多类分类算法CBCP,即根据垂直格式的数据集采用路径枚举的方法挖掘闭合模式,极大地减少了冗余模式的产生。然后,对所有闭合模式进行排序,通过覆盖训练集建立分类器。针对分类器无法识别的样本提出权重算法进行判断,克服了使用Default类预测不精确的问题。研究结果表明,CBCP与经典分类算法如CBA和C4.5相比具有更高的预测准确率,并且在基因数大幅增加而样本数不变的情况下仍具有较强的稳定性,证明CBCP的可扩展性强,适用于高维数据集的多类分类预测。 According to the characteristics of multi-class high-dimension gene expression profile, a new multi-class classification algorithm(CBCP) based on closed pattern was designed. Firstly an approach called path enumeration was proposed to mine closed patterns based on the vertical formatted data-table, which can reduce most redundant patterns. Then closed patterns were sorted and used to cover train dataset for building the classifier. The unrecognized samples were classified by weight algorithm, which can overcome the inaccuracy caused by using Default Class. The results show that the algorithm is proved to be more accurate than classical classification algorithms such as CBA and C4.5. CBCP keeps high accuracy when the number of genes increases substantially with the increase of number of samples fixed, which proves it is suitable for multi-class classification of high dimension datasets, and it is easy to extend.
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第5期1035-1041,共7页 Journal of Central South University:Science and Technology
基金 国家杰出青年科学基金资助项目(60425310) 中南大学博士后基金资助项目(2008)
关键词 关联规则 闭合模式 多类别 权重算法 association rules closed pattern multi-class weight algorithm
  • 相关文献

参考文献15

  • 1HAN Jia-wei, Kamber M. Data mining: Concepts and techniques[M]. Beijing: Higher Education Press, 2001: 10-20.
  • 2Doug B, Johannece G, Manuel M. MAFIA: A maximal frequent itemset algorithm for transactional databases[C]//Proceedings of the 17th International Conference on Data Engineering. German: Heidelbergt, 2001: 443-452.
  • 3Bastide Y, Pasquier N, Taouil R. Discovering frequent closed itemsets for association rules[C]//Proceedings, of the 7th International Conferenece on Database Theory. Jerusalem: Springer-Verlag, 1999: 398-416.
  • 4Bing L, Wayne S, Yiming M. Integrating classification and association rule mining[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998: 80-86.
  • 5LI Wen-min, HAN Jia-wei, PEI Jian. CMAR: Accurate and efficient classification based on multiple class association rules[C]//Proceedings of IEEE International Conference on Data Mining. San Jose: CA, 2001: 369-376.
  • 6李宏,杜剑峰,陈松乔.分布式数据库约束性关联规则挖掘[J].中南大学学报(自然科学版),2004,35(6):998-1003. 被引量:1
  • 7邹晓峰,陆建江,宋自林.基于模糊分类关联规则的分类系统[J].计算机研究与发展,2003,40(5):651-656. 被引量:19
  • 8Thabtah H, Cowling P, Yonghong P. MMAC: A new multi-class, multi-label associative classification approach//Proceedings of IEEE International Conference on Data Mining. Brighton, 2004: 217-224.
  • 9Lim T, Weiyin L. A comparison of prediction accuracy, complexity and training time of thirty-three old and new classification algorithms[J]. Machine Learning, 2000, 40: 203-228.
  • 10Quinlan J. C4.5: Programs for machine learning[M]. San Francisco: Morgan Kaufmann, 1993: 56-89.

二级参考文献26

  • 1B Lent, A Swami, J Widom. Clustering association rules. In:AlexGray, Per-Ake Larson eds. Proc of the 13th Int'l Conf on Data Engineering. Birmingham, England: IEEE Computer Society, 1997. 220-231.
  • 2B Liu, W Hsu, Y Ma. Integrating classification and association rule mining. In: R Agrawal, P E Stolorz, G Piatetsky-Shapiro eds. Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 80-86.
  • 3G Dong, J Li. Efficient mining of emerging patterns: Discovering trends and differences. In, S Chaudhuri, D Modigan eds. Proc of the 5th Int' 1 Conf on Knowledge Discovery and Data Mining. San Diego, CA: ACM Press, 1999. 43-52.
  • 4J Li, G Dong, K Rmxmmohtmarao. Making use of the most expressive jumping emerging patterns for classification. In: Takao Terano, Huan Liu, Arbee L P Chen eds. Proc of the 4th Pacific-Asia Conf on Knowledge Discovery mad Data Mining. Kyoto,Japan: Springer, 2000. 220-232.
  • 5Chan Man Kuok, Aria Fu, Man Hon Wong. Mining fuzzy association rules in database. SIGMOD Record, 1998, 27(1): 41-46.
  • 6R J Hathaway, J W Davenport, J C Bezdek. Relational dual of the c-means algorithms. Pattern Recognition, 1989, 22 (2) : 205-212.
  • 7R Agrawal, R Srikant. Fast algorithms for mining association rules. In: J B Bocca, M Jarke, C Zaniolo eds. Proc of the 20th Int'l Conf on Very Large Databases. Santiago, Chile: Morgan Kaufmann, 1994. 487-499.
  • 8L I Kuncheva. How good are fuzzy if-then classifiers? IEEE Trans on Systems, Man, and Cybernetics, Part B: Cybernetics, 2000,30(4):501-509.
  • 9O Cordon, F Herrera. A three-stage evolutionary process for learning descriptive and approximative fuzzy logic controller knowledge bases from examples.Imernational Journal of Approximate Reasoning, 1997, 17(4): 369-407.
  • 10Z Michalewicz. Genetic Algorithms + Data Structure = Evolution Programs. New York: Springer-Verlag, 1994.

共引文献18

同被引文献15

  • 1Singh D, Febbo P G, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior[J]. Cancer Cell, 2002, 1(2): 203-209.
  • 2Dash S, Patra B. A study on gene selection and classification algorithms for classification of gene expression profile[J]. International Journal of Research and Reviews in Computer Science, 2011, 2(5): 1212-1217.
  • 3LI Bo, ZHENG Churthou, HYANG Deshuang, et al. Gene expression data classification using locally linear discriminant embedding[J]. Computers in Biology and Medicine, 2010, 40(10): 802-810.
  • 4Kancherla K, Mukkamala S. Feature selection for lung cancer detection using SVM based recursive feature elimination method[J]. Machine Learning and Data Mining in Bioinformatics, 2012, 7246: 168-176.
  • 5Tari L, Baral C, Kim S. Fuzzy c-means clustering with prior biological knowledge[J]. Journal of Biomedical Informafics, 2009, 42(1): 74-81.
  • 6Patterson A D, LI Henghong, Eichler G S, et al. UPLC-ESI-TOFMS-based metabolomics and gene expression dynamics inspector self-organizing metabolomic maps as tools for understanding the cellular response to ionizing mdiatinn[J]. American Chemical Society, 2008, 80(3): 665-674.
  • 7ZHU Xiaojin. Semi-supervised learning with graphs[D]. Pennsylvania: Carnegie Mellon University. School of Computer Science, 2005: 5-8.
  • 8WANG Fei, ZHANG Changshui. Label propagation through linear neighborhoods[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-67.
  • 9WANG Jingdong, WANG Fei, ZHANG Changshui, et al. Linear neighborhood propagation and its applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(9): 1600-1615.
  • 10BAI Xiang, YANG Xingwei, Latecki L J, et al. Learning context-sensitive shape similarity by graph transduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 861-874.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部