期刊文献+

非相关线性判别分析用于蛋白质组数据的分类及特征挑选(英文)

Classification and feature selection of proteomic data by uncorrelated linear discriminant analysis
原文传递
导出
摘要 提出一种非相关线性判别分析(ULDA)结合统计卡方检验(CHI2)的方法用于蛋白质组质谱数据的分类及特征挑选.首先以卡方检验为过滤器去除无类间差别的变量,然后用ULDA进行样本分类与特征筛选,通过对两组数据的分析,最终选择出的特征变量在这两组数据中的特异性分别为98.2%和95.74%,灵敏度均为100%.结果表明本文提出的方法能较好地处理变量数很大的蛋白质组数据,同时表明最后选择的特征变量有可能作为潜在的生物标记物,为相关疾病的早期诊断提供线索. A uncorrelated linear discriminant analysis (ULDA) combined with Chi-squared (CHI2) method was proposed in this paper and was used to classification and feature selection for proteomic MS data. The method uses CHI2 method as a filter for eliminates the irrelative variables for classification firstly, and then performs ULDA for sample classification and feature selection. After analysis for 2 datasets, the selected variables obtained 98.2% and 95.74% specificity respectively, and 100% sensitivity for both. It can be inferred from the results that it is possible to differentiate between control and cancer samples using the proposed approach, it is also possible that the selected variables can be regard as potential biomarkers that provide clues for disease earlier detection.
出处 《计算机与应用化学》 CAS CSCD 北大核心 2009年第12期1563-1566,共4页 Computers and Applied Chemistry
基金 国家自然科学基金(20975039)资助项目
关键词 非相关线性判别分析 卡方检验 蛋白质组学 质谱数据 生物标记物 uncorrelated linear discriminant analysis, Chi-squared test, proteomics, MS data, biomarker
  • 相关文献

参考文献16

  • 1Wulfkuhle J D, Liotta L A and Petricoin E F. Proteomic applications for the early detection of cancer. Nature Reviews Cancer, 2003, 3(4):267-275.
  • 2Petricoin E F, Ardekani A M and Hitt B A, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 2002, 359(9306): 572-577.
  • 3Skytt A, Thysell E and Stattin P, et al. SELDI-TOF MS versus prostate specific antigen analysis of prospective plasma samples in a nested case-control study of prostate cance, international Journal of Cancer, 2007, 121(3): 615-620.
  • 4Bruce S J, Jonsson P and Antti H, et al. Evaluation of a protocol for metabolic profiling studies on human blood plasma by combined ultra-performance liquid chromatography/mass spectrometry: From extraction to data analysis. Analytical Biochemistry, 2008, 372(2): 237-249.
  • 5Hendriks M M W B, Smit S and Akkermans W L M W, et al. How to distinguish healthy from diseased? Classification strategy for mass specitrometry-based clinical proteomics. Proteomics, 2007, 7(20): 3672-3680.
  • 6Berven F S, Kroksveen A C and Berle M, et al. Pre-analytical influence on the low molecular weight cerebrospinal fluid proteome. Proteomics Clinical Applications, 2007, 1: 699-711.
  • 7Rajalahti T: Arneberg R and Berver F S, et al. Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemometrics and Intelligent Laboratory Systems, 2009, 95(1): 35-48.
  • 8Chen L. Using multivariate curve resolution to improve proteomic mass spectra classification. Chemometrics and Intelligent Laboratory Systems. 2008, 94(2):123-130.
  • 9Jeffries N O. Performance of a genetic algorithm for mass spectrometry proteomics. BMC Bioinformatics, 2004, 5:180.
  • 10Saeys Y, Inza 1 and Larrafiaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19):2507-2517.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部