摘要
基因表达谱的肿瘤类型的准确判断对当前生物信息学的研究有重大意义.基因表达谱存在样本少、维数高、冗余基因和噪音多等特点,对癌症特征基因的提取方法的研究具有重要的意义.以结肠癌肿瘤基因表达谱数据作为研究对象,提出了结合使用基因选择和数据抽取的有效方法,剔除无关基因选出候选特征集,结合PCA(主元分析)获取低维投影空间中的模式特征,根据各个基因贡献率大小排序选取贡献率大的基因作为特征基因,进而利用支持向量机进行分类检测.
Accurate judgment in identifying tumor gene expression profiles is important for cur- rent bioinformatics. Gene expression profile features scarcity of samples, high dimensionality, and redundancy of genes and noises. Therefore, it is of great significance to study the extrac- tion of informative gene of tumor. This study addresses the colon tumor gene expression profile data and proposes an effective combination of gene selection and data extraction. By removing the irrelevant genes, the candidate feature set is selected. The low-dimensional projection space pattern features are obtained by means of PCA (Principal Component Analysis). The informa- tive genes are selected in regard to the priority of contribution. The support vector machine is applied for classification testing.
出处
《西安文理学院学报(自然科学版)》
2014年第2期15-18,共4页
Journal of Xi’an University(Natural Science Edition)
关键词
基因表达谱
特征基因
主成分
肿瘤分类
gene expression profile
informative gene
principal component
tumor classification