摘要
高维、小样本数据的特征选择方法在蛋白质质谱数据处理分析领域有着广泛应用。本文针对蛋白质质谱特征选择问题,结合稀疏表示这一新理论框架,提出了一种基于稀疏表示的特征选择算法(sparse representation based feature selection,SRFS)。该方法将稀疏表示分类的结果作为评定某一个特征子空间特征相对重要性的度量,然后通过对大量随机采样子空间计算结果的统计,得到特征空间中每个特征的排序,并进一步分析提炼出与肿瘤疾病相关的若干谱峰。通过在卵巢癌公共数据集OC-WCX2a和浙江省肿瘤医院乳腺癌数据集BC-WCX2a上的实验结果表明,SRFS算法可以有效应用于本文所使用的SELDI-TOF蛋白质质谱数据的分析。
Feature selection method has been widely used for protein spectrometry data which has high dimension and small samples size. In this paper, a novel feature selection method based on sparse representation (SRFS) is proposed. SRFS considers a feature be important or informative if the subset containing it can perform well in a sparse representation classifier (SRC). In this method, the relative importance of a subset was measured via SRC. And by means of the results of abundant random subsets, we ranked all the features. We also extracted a few peaks which were related with cancer closely. To investigate the performance, the proposed method was tested and evaluated on the ovarian cancer database OC-WCX2a and breast cancer database BC-WCX2a which supplied by Zhejiang Cancer Hospital. The experimental results show that SRFS can be used to select highly predictive representative feature sets in SELDI-TOF protein spectrometry data analysed in this paper.
出处
《生物物理学报》
CAS
CSCD
北大核心
2012年第8期683-691,共9页
Acta Biophysica Sinica
基金
国家自然科学基金项目(60801054
60801055)
国家杰出青年科学基金项目(60788101)
浙江省公益性技术应用研究项目(2010C33017)
浙江省医药卫生科学研究基金项目(2010KYA041)
浙江省省级科技项目(Y2080586)~~
关键词
蛋白质质谱
稀疏表示
特征选择
Protein mass spectrum
Sparse representation
Feature selection