摘要
针对多类高维基因表达谱的特点,提出一种基于闭合模式的多类分类算法CBCP,即根据垂直格式的数据集采用路径枚举的方法挖掘闭合模式,极大地减少了冗余模式的产生。然后,对所有闭合模式进行排序,通过覆盖训练集建立分类器。针对分类器无法识别的样本提出权重算法进行判断,克服了使用Default类预测不精确的问题。研究结果表明,CBCP与经典分类算法如CBA和C4.5相比具有更高的预测准确率,并且在基因数大幅增加而样本数不变的情况下仍具有较强的稳定性,证明CBCP的可扩展性强,适用于高维数据集的多类分类预测。
According to the characteristics of multi-class high-dimension gene expression profile, a new multi-class classification algorithm(CBCP) based on closed pattern was designed. Firstly an approach called path enumeration was proposed to mine closed patterns based on the vertical formatted data-table, which can reduce most redundant patterns. Then closed patterns were sorted and used to cover train dataset for building the classifier. The unrecognized samples were classified by weight algorithm, which can overcome the inaccuracy caused by using Default Class. The results show that the algorithm is proved to be more accurate than classical classification algorithms such as CBA and C4.5. CBCP keeps high accuracy when the number of genes increases substantially with the increase of number of samples fixed, which proves it is suitable for multi-class classification of high dimension datasets, and it is easy to extend.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2008年第5期1035-1041,共7页
Journal of Central South University:Science and Technology
基金
国家杰出青年科学基金资助项目(60425310)
中南大学博士后基金资助项目(2008)
关键词
关联规则
闭合模式
多类别
权重算法
association rules
closed pattern
multi-class
weight algorithm