摘要
目前质谱技术被广泛应用于未知化合物成分分析。一种常见方式是将测得的待分析化合物质谱与现存质谱库中已有条目进行相似性计算。然而现有谱库存在覆盖性不足的问题:对于不存在于谱库中的化合物无法实现正确的检索。一种解决此问题的方式是从已知的分子结构及其对应的质谱数据中,利用神经网络得到分子结构特征与谱峰间存在的潜在映射关系,从而实现对质谱的预测。针对目前质谱预测方法中存在的分子结构特征丢失的问题,提出了一种基于分子嵌入的质谱预测方法,使用分子嵌入方法将分子结构特征转换为高维特征向量。结果标明,相较于传统方法中使用分子指纹对分子结构特征进行表示,使用分子嵌入方法进行质谱预测所得到的质谱平均相似性提高了5.4%,这些预测质谱在化合物检索任务中的表现也超过了基于分子指纹的预测方法。本文同时对实验中使用的数据集进行了差异性分析,表明该方法具有较好的泛化性能。
At present,mass spectrometry technology is widely used in the analysis of unknown compounds.A common way is to calculate the similarities between the measured mass spectrum of the compound to be analyzed and the existing items in the existing mass spectra library.However,the existing reference mass spectral libraries have coverage problems:it is impossible to achieve a correct search for compounds that do not exist in the reference library.One way to solve this problem is to use neural networks to obtain the potential mapping relationship between molecular structure features and spectral peaks from the known molecular structure and its corresponding mass spectrum data,so as to realize the prediction of mass spectra.Aiming at the problem of loss of molecular structure features in current mass spectra prediction methods,a mass spectra prediction method based on molecular embedding is proposed,which uses molecular embedding methods to convert molecular structure features into high-dimensional feature vectors.The results show that compared to the traditional method using molecular fingerprints to express molecular structural features,the average similarity of mass spectra predicted by our model is increased by 5.4%,and the performance of these predicted mass spectra in compound retrieval tasks also exceeds a prediction method based on molecular fingerprints.We have also analyzed the dataset used in our experiment to ensure that our method has a good ability of generalization.
作者
张宝杰
夏卿
陈鹏
夏懿
章军
ZHANG Baojie;XIA Qing;CHEN Peng;XIA Yi;ZHANG Jun(Department of Electrical Engineering and Automation,Anhui University,Hefei 230000,China)
出处
《真空科学与技术学报》
CAS
CSCD
北大核心
2022年第3期165-169,共5页
Chinese Journal of Vacuum Science and Technology
基金
国家自然科学基金项目(61872004
61672035
62072002)
安徽省自然科学基金项目(2108085MF232)。
关键词
质谱
化合物检索
特征提取
谱峰
Mass spectrometry
Compound retrieval
Features extraction
Spectral peak