期刊文献+

运用近邻传播聚类分析进行SELDI-TOF蛋白质谱特征选择 被引量:5

Feature Selection and Sample Classification for SELDI-TOF Mass Spectrometry Data Based on Affinity Propagation Clustering
下载PDF
导出
摘要 针对如何有效分析高通量SELDI-TOF质谱数据以及筛选与肿瘤相关的蛋白质位点,提出一种基于近邻传播聚类分析的特征选择方法。首先利用t-test对SELDI数据进行初筛,然后利用近邻传播聚类分析以及零空间LDA对数据进行降维和去相关处理,最后采用SVM-RFE进行特征选择,筛选出与肿瘤判别相关的蛋白质位点。利用SVM、KNN、NB及J4.8等4个分类器,估算算法的分类性能。结果表明,在卵巢癌公共数据集OC-WCX2a和OC-WCX2b以及浙江省肿瘤医院乳腺癌数据集BC-WCX2a上显示该算法,在上述3个数据集中分类率分别达到96.43%、99.66%、90.88%,敏感性分别达到97.00%、100%、96.17%,特异性分别达到95.85%、99.08%、81.92%,并分别挑选出与肿瘤判别相关的10个蛋白位点。所提出的算法能够获得较好的分类率,有效提取出具有较好判别效果的蛋白质谱位点,有助于癌症的辅助诊断。 To analysis high throughput and high resolution mass spectrometry data effectively and capture the cancer related protein feature from the mass spectrometry data, diagnosis called a feature selection based on affinity propagation clustering of mass spectrometry was proposed in this paper. Firstly, the t-test was used on mass spectrometry data, followed by feature selection based on affinity propagation clustering. Next, affinity propagtion clustering and NS-LDA was used for reducing dimensions and correlation. Thirdly, SVM-RFE was used to select the features. Finally, we used four classifiers to estimate the performance of the algorithm. The proposed method was tested and evaluated on the ovarian cancer database OC-WCX2a, OC-WCX2b, and breast cancer database BC-WCX2a. Classification was achieved 96.43 % , 99.66 % and 90. 88 % , sensitivity was achieved 97.00 %, 100 % and 96. 17 %, specificity was achieved 95.85 %, 99.08 % and 81.92 %, respectively. And 10 m/z features were selected for each dataset. The experimental results showed good performance of the method, and the method is expected to be used in cancer diagnosis.
出处 《中国生物医学工程学报》 CAS CSCD 北大核心 2013年第1期14-20,共7页 Chinese Journal of Biomedical Engineering
基金 国家自然科学基金(60801054 60801055) 国家杰出青年基金(60788101)
关键词 蛋白质质谱 近邻传播聚类分析 特征选择 生物标志物 mass spectrometry affinity propagation clustering feature selection biomarker
  • 相关文献

参考文献14

  • 1Jemal A, Sieg el R, Ward E, et al. Cancer statistics[J]. A CancerJournal of Clinicians, 2011, (61) : .. 212 -236.
  • 2Bruce A,Julian L, Martin R, et al. Molecular biology of the cell, fourth edition[MJ. New York: Garland Science, 2004.
  • 3Shin H, Sheu B,Joseph M, et al. Cuilt-by-association feature selection: identifying biomakers from proteomic profiles[J] . Biomed Inform, 2008, 41: 124 - 136.
  • 4Somorjai RL, Dolenko B, Baumgartner R. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions[J]. Bioinformatics, 2003, 19(12): 1484 - 1491.
  • 5Thomas A, Tourassi CD, Elmaghraby AS, et al. Data mining in proteomic mass spectrometry[J]. Clinical Proteomics, 2006, 2 (1 ) : 13 - 21.
  • 6Hanczar B, Courtine M, Benis A, et al. Improving classification of microarray data using prototype-based feature selection[J] . SIGKDD Exploration, 2003, 5: 23 - 30.
  • 7Wang Yuhang, Fillia Makedon,James Ford, et al. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data[J] . Bioinformatics, 2005, 21 (8): 1530 - 1537.
  • 8Yang Pengyi, Zhang Zili, Zhou Bingbing, et al. A clustering based hybrid system for biomaker selection and sample classification of mass spectrometry data[J]. Neurocomputing, 2010,73(13 -16): 2317 -2331.
  • 9Yang Pengyi, Zhang Zili. A clustering based hybrid system for mass spectrometry data analysis[J]. Pattern Recognition in Bioinformatics, Lecture Notes in Bioinformatics, 2008, 5265: 98 -109.
  • 10Huang Rui, Liu Qingshan, Lu Hanqing , et al. Solving the small sample size problem of LDA[J]. Pattern Recognition, 2002, 3: 29 -32.

二级参考文献61

  • 1曹志成.蛋白质芯片SELDI-TOFMS技术的研究进展及其在临床中的应用[J].生物工程学报,2006,22(6):871-876. 被引量:15
  • 2陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 3Petricoin EF, Ardekani AM, Hitt BA, et al. Use of proteomie patterns in serum to identify ovarian cancer [J]. The Lancet, 2002, 359: 572- 577.
  • 4Laure FM, Matthew AR, Laurent BF. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry in clinical chemistry [J]. Clinica Chimica Acta, 2003, 337(1-2):11- 21.
  • 5Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDITOF protein patterns in serum: comparing datasets from different experiments [J]. Biolnformatics, 2004,20(5) : 777 - 785.
  • 6Malyarenko DI, Cooke WE, Adam BL, et al. Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques []]. Clin Chem, 2005, 51:65 - 74.
  • 7Morris JS, Coombes KR, Koomen J, et al. Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum [J]. Bioinformatics, 2005, 21(9): 1764- 1775.
  • 8Hong Huixiao, Dragan Y, Epstein J, et al. Quality control and quality assessment of data from surface-enhanced laser desorption/ ionization (SELDI) time-of flight (TOF) mass spectrometry (MS) [J]. BMC Bioinformatics, 2005, 6(Suppl 2):S5.
  • 9Dijkstra M, Vonk RJ, Jansen RC. SELDI-TOF mass spectra: A view on sources of variation [ J 1. Journal of Chromatography B, 2007, 847:12 - 23.
  • 10Yasui Y, Pepe M, Thompson ML, et al. A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection [J]. Biostatistics, 2003, 4(3) : 449 - 463.

共引文献13

同被引文献73

  • 1JULIO C L A, CLAUDETE B H, RONEI J P. De termination of diesel quality parameters using sup port vector regression and near infrared spectrosco py for an in-line blending optimizer system[J]. Fu el, 2012,97:710-717.
  • 2WANG S J, WU D, LIU K S. Semi-supervised ma- chine learning algorithm in near infrared spectral calibration: a case study on diesel fuels[J]. Ad- vanced Science Letters, 2012,11 ( 1 ) :416-419.
  • 3ZHANG W B, YUAN W Q, ZHANG X M, et al: Predicting the dynamic and kinematic viscosi- ties of biodiesel-diesel blends using mid- and near- infrared spectroscopy[J]. Applied Energy, 2012, 98: 122-127.
  • 4LI H D, LIANG Y Z, LONG X X, etal: The contimuity of sample complexity and its relationship to multivariate calibration: a general perspective on first-order calibration of spectral data in analytical chemistry [J]. Chemometrics and Intelligent Labo- ratory System, 2013,122(3) : 23-30.
  • 5de CARVALHO ROCNA W F, NOGUEIRA R, VAZ B G. Validation of model of multivariate cali- bration: an application to the determination of biodiesel blend levels in diesel by near-infared spec troscopy[J]. Journal of Chemometrics, 2012, 26 (8-9) :456-461.
  • 6SCHOLZ M, GAATZEK S, STERLINGA, etal. Metabolite fingerprinting: detecting biological fea tures by independent component analysis[J]. Bioin formatics, 2004, 20(15): 2447-2454.
  • 7SHAOXG, WANGW, HOUZY, etag: Anew regression method based on independent component analysis[J]. Talanta, 2006, 69: 676-680.
  • 8TOIVIAINEN M, CORONA F, PAASO J, etal: Blind source separation in diffuse reflectance NIR spectroscopy using independent component analysis [J]. JournaZofChemometrics, 2010, 24(10) :514-522.
  • 9HYVAR1NEN A. Independent component analy sis: recent advances [J]. Philosophical Transac- tions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2013, 371:1-20.
  • 10WALDMANN I P, TINETTI G, DEROO P, et al: Blind extraction of an exoplanetary spectrum through independent component analysis[J]. The Astrophysical Journal, 2013, 766(1): 7-15.

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部