期刊文献+

基于聚类分析和半监督学习的蛋白质质谱数据分类 被引量:2

Classification of Proteomic Mass Spectrometry Data Based on Affinity Propagation Clustering and Semisupervised Learning
下载PDF
导出
摘要 目的针对高维冗余的SELDI蛋白质质谱数据,提出一种基于聚类分析和半监督学习的数据分类方法。方法算法首先运用t-test对蛋白质质谱数据进行初步降维;然后将处理后的数据用聚类分析算法进行进一步降维;最后运用半监督学习算法传递标签,充分提取有标记样本和无标记样本的信息,从而进行分类。结果在公共卵巢癌数据集OC-WCX2b和公共前列腺癌数据集PC-H4上获得了99.15%和96.75%分类准确率。在浙江省肿瘤医院临床乳腺癌数据集BC-WCX2a上获得了95.18%的分类准确率和100%的敏感性。结论基于聚类分析的半监督学习方法能够有效利用未标记的质谱样本信息,与经典的监督学习算法相比,其分类性能更理想、实用性更好。 Objective To propose a classification method based on affinity propagation clustering and semi-supervised learning for the high-dimensional and redundant mass spectrometry data. Methods First,t-test was applied to extract part of component of the proteomic mass spectrometry data preliminarily. Then,the affinity propagation clustering was employed to extract the principal component. Finally,to take advantage of both labeled samples and unlabeled samples,semi-supervised learning was used to predict the labels. Results The classification accuracy of the algorithm proved to be 99. 15% and 96. 75% respectively in the public ovarian cancer database OC-WCX2 b and the public prostate cancer database PC-H4. In the clinical breast cancer database BC-WCX2 a of Zhejiang Cancer Hospital,the classification accuracy was 95. 18% and the sensitivity was 100%. Conclusion The experimental results demonstrate that the method of classification based on affinity propagation clustering and semi-supervised learning can effectively make use of the information from unlabeled mass spectrometry samples. Compared with the supervised learning method,it proves to be a more ideal method of classification and more practical.
出处 《航天医学与医学工程》 CAS CSCD 北大核心 2014年第5期367-372,共6页 Space Medicine & Medical Engineering
基金 国家自然科学基金(60801054 61205200) 浙江省自然科学基金(LY12F01005)
关键词 蛋白质质谱 聚类分析 半监督学习 特征提取 proteomic mass spectrometry cluster analysis semi-supervised learning feature extraction
  • 相关文献

参考文献14

  • 1刘军莲,李勇枝,高建义,盖玉清,王静,薛春美,辛冰牧.蛋白质组学技术研究进展[J].航天医学与医学工程,2009,22(2):151-156. 被引量:10
  • 2Vorderwubecke S, Cleverley S, Weinberger SR, et al. Protein quantification by the SELDI-TOF-MS-based protein chip system[J]. Nat Methods, 2005, 2(5):393.
  • 3Somorjai RL, Dolenko B, Baumgartner R. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions[ J]. Bioinforma- tics, 2003, 19 (12) : 1484-1491.
  • 4Dueck D, Frey BJ. Clustering by passing messages between data points[J]. Science ,2007,315 (5814) :972-976.
  • 5Zhu Xiaojin. Semi-supervised learning literature survey [D]. Madison : University of Wisconsin, 2007.
  • 6王雪松,张晓丽,程玉虎.一种简洁局部全局一致性学习[J].控制与决策,2011,26(11):1726-1730. 被引量:7
  • 7Zha Zhengjun, Mei Tao. Graph-based semi-supervised learning with multiple labels [ J ]. Visual hnage, 2009, 20 ( 2 ) : 97- 103.
  • 8Yasui Y, Pepe M, Thompson M, et al. A data-analytic strategy for protein biomarker discovery : profiling of high dimensional proteomic data for cancer detection [J]. Biostatistics, 2003, 4(3) : 449-463.
  • 9罗凯旋,钟凡,赵亮,贺福初.评估几种降维分类器应用于生物质谱数据的性能[J].中国科学:生命科学,2010,40(6):544-550. 被引量:3
  • 10Pascal, Caroline T. Protein mass spectra data analysis for clinical biomarker discovery : a global review [J].Brief Bioin- form, 2011, 12(2) : 1_76-186.

二级参考文献59

共引文献19

同被引文献12

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部