摘要
常用的排列方法从DNA微数据中选择的基因集合往往会包含相关性较高的基因,而且使用单个基因评价方法也不能真正反映由此得到的特征集合分类能力的优劣。另外,基因数量远多于样本数量是进行疾病诊断面临的又一挑战。为此,提出一种DNA微阵列数据特征提取方法用于组织分类。该方法运用K-means方法对基因进行聚类分析,获取各子类DNA微阵列数据中心,用排列法去除对分类无关的子类,然后利用ICA方法提取剩余子类集合的特征,用SVMs方法构造分类器对组织进行分类。真实的生物学数据实验表明,该方法通过提取一种复合基因,能综合评价基因分类能力,减少特征数,提高分类器的分类准确性。
Gene sets of interest typically selected by usual ranking methods from DNA microarray data will contain many highly correlated genes,and using the evaluating method of single gene does not reflect really the capacity of classifier of character sets.And disease diagnostics based on gene expression microarray data presents another major challenge due to the number of genes far exceeding the number of samples.So a method of extracting DNA microarray data features for the tissue classification is proposed.The method makes use of K-means to cluster analysis for genes,getting the DNA microarray data centers of every subclass,then uses ranking methods to get grid of the genes not useful for classification.Then,the features of the remaining subclass sets are extracted by ICA,thus a classifier is structured by SVMs for tissues classification.Real biological data experiments show that the method can evaluate the classification capacity of genes,decrease the number of features and increase the classification accuracy of the existing classifiers by extracting a compound gene.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第28期40-42,共3页
Computer Engineering and Applications
基金
国家社会科学基金No.08CTQ003
广东省自然科学基金No.2008276
华南农业大学校长基金No.4900-K06166
重庆市科委重点攻关项目No.2008AC0043~~