摘要
目的基于液相色谱-串联质谱的数据非依赖性采集(data-independent acquisition,DIA)方法是蛋白质组数据获取的一种主要方式,采集的混合二级质谱由多个肽段同时碎裂组成,增加了肽段定性和定量的复杂度。目前主流的基于提取离子色谱图的方法需要经过预处理,构建色谱峰,提取色谱峰特征等操作。这类方法流程复杂,存在很多误差,并且不同的色谱图复杂度和色谱时间会影响定性和定量的准确度。针对该方法的不足之处,课题组提出一种基于深度学习的方法,直接对肽段进行定性和定量。方法与基于提取离子色谱图的方法不同,本课题组没有使用色谱维度的信息,不会受到色谱图复杂度和色谱时间等因素的影响。将预处理后的质谱数据输入到两个基于卷积神经网络(convolutional neural network,CNN)的模型中,通过二分类和回归预测的方式,解决定性和定量问题。结果课题组在公开数据集上进行了实验,与准确度较高的FIGS相比,提高了定性结果的重复性,在保证定量准确度的同时提高了不同丰度下的肽段定量数量。结论本文提出的基于深度学习的模型,没有使用色谱维度的信息,可以有效地对肽段进行定性和定量。
Objective The data-independent acquisition(DIA)method based on liquid chromatography-tandem mass spectrometry is one of the main methods of proteomic data acquisition.The collected mixed MS/MS is composed of multiple peptide fragments at the same time,which increases the complexity of peptide identification and quantification.The current mainstream methods based on ion chromatograms require preprocessing,construction of chromatographic peaks,and extraction of chromatographic peak features.This kind of method is complicated in process,and there are many errors.And different chromatogram complexity and chromatographic time may affect the accuracy of identification and quantification.In view of the shortcomings of this method,we propose a method based on deep learning to directly identify and quantify peptides.Methods Unlike methods based on ion chromatograms,we do not use the information of chromatographic dimensions and may not be affected by factors such as the complexity of chromatograms and chromatographic time.We input the preprocessed mass spectrum data into two models based on convolutional neural networks,and solve qualitative and quantitative problems through binary classification and regression prediction.Results We conduct experiments on the public dataset.Compared with FIGS with high accuracy,we improve the qualitative repeatability and increase the quantitative number of peptides under different abundances while ensuring the quantitative accuracy.Conclusions The model based on deep learning proposed in this paper does not use the information of chromatographic dimensions,and can effectively identify and quantify peptides.
作者
刘扣龙
郑浩然
LIU Koulong;ZHENG Haoran(Department of Computer Science and Technology,University of Science and Technology of China,Hefei230027)
出处
《北京生物医学工程》
2022年第6期569-575,共7页
Beijing Biomedical Engineering
基金
国家重点基础研究发展计划(2017YFA0505502)
中国科学院战略性先导科技专项(XDB38000000)资助。
关键词
蛋白质组学
深度学习
数据非依赖性采集
相对定量
质谱
proteomics
deep learning
data independent acquisition
relative quantification
mass spectrometry