摘要
癌基因表达数据集具有小样本、高维数之特点,一般的机器学习机难以对其有效分类。因此,通常需要采用某些特征提取度量标准来进行降维处理。可是常用的一些特征提取度量标准亦会导致分类效果欠佳之问题。依据微分容量控制学习机DCCM,提出了一个新的特征提取度量标准NFEC,然后依据NFEC和DCCM,提出了适于癌基因表达数据集的特征提取算法DCCFE。实验表明,新的度量NFEC和新的特征提取算法DCCFE较之现有方法对癌基因表达数据集分类时更为有效。本文的工作意义在于:(1)提出了一个新的更有意义的特征提取度量标准;(2)DCCM可以采用比核函数更为一般的一阶可微函数,因而提出的新的特征提取算法更具普遍应用意义。
The classification accuracies for cancer gene expression datasets are often collapsed by using current classification criteria, due to their high dimensionality and too small sizes. In this paper, based on DCCM(Differential Capability Control Machine ), a new feature- extraction criterion NFEC is developed and a new feature- extraction algorithm DCCFE is accordingly proposed. Our experimental results demonstrate that the new feature - extraction criterion NFEC is better than current criteria, and the new algorithm DCCFE outperforms the current approaches for cancer gene expression datasets. Furthermore , since DCCM admits more general differential functions rather than kernel functions in SVM, Our approach here hints more potential application in bioinformatics.
出处
《生物信息学》
2004年第2期13-20,共8页
Chinese Journal of Bioinformatics
基金
国家自然科学基金(60225015)
江苏省自然科学基金(BK2003017)
中科院软件所计算机科学开放基因(SYSKF)
关键词
生物信息学
微分容量控制
特征提取
癌基因表达数据集
分类
bioinformatics
differential capability control
feature extraction
cancer gene expression datasets
classification