摘要
近年来,数据不依赖获取(data-independent acquisition,DIA)质谱技术在蛋白质组学领域内被广泛关注.然而DIA质谱数据具有维度高、背景噪声大、多种信号混合等特点,这使得DIA质谱数据的分析成为一大挑战.本文提出一种基于深度学习的可直接处理DIA质谱数据的算法:Ultra-DIA.该算法使用深度变分自动编码器提取离子信号的特征来区分不同肽段产生的子离子,最终生成虚拟谱图,进而对肽段和蛋白进行定性和定量分析.对于测试数据,该算法找到的肽段数量和蛋白数量比主流算法DIA-Umpire分别多61.4%和64.5%.此外,相较于DIA-Umpire,该算法能够找到更多低浓度的蛋白.
In recent years,data-independent acquisition(DIA)mass spectrometry techniques have received wide attention in proteomics.However,DIA data are characterized with high dimensionality,large background noises,and mixing of multiple signals,which further challenge the analysis of DIA data.In this work,an algorithm based on deep learning that can directly process DIA mass spectrum data,namely Ultra-DIA,has been developed.It is combined with the deep variational auto-encoder and a variety of machine learning algorithms to directly process DIA data and to extract the features of MS ion signals,so that fragment ions generated by different peptides can be distinguished.Finally,Ultra-DIA generates pseudo-spectra to identify and quantify MS peptides and proteins.For the test data,our algorithm has found 61.4%more peptides and 64.5%more proteins than the mainstream algorithm of DIA-Umpire.In addition,our algorithm is capable of finding more proteins at low concentration compared to the DIA-Umpire.
作者
何情祖
钟传奇
李翔
帅建伟
韩家淮
HE Qingzu;ZHONG Chuanqi;LI Xiang;SHUAI Jianwei;HAN Jiahuai(College of Physical Science and Technology,Xiamen University,Xiamen 361005,China;School of Life Sciences,Xiamen University,Xiamen 361102,China;National Institute for Data Science in Health and Medicine,Xiamen University,Xiamen 361102,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第1期97-103,共7页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(11874310,11675134)。
关键词
深度学习
变分自动编码器
数据不依赖获取
质谱数据
deep learning
variational autoencoders
data-independent acquisition
mass spectrometry data