
基于分段加权的点积相似度方法研究 被引量:2

Research of mass spectrum similarity approaches based on segment weighted dot-product
摘要 针对信噪比较低或因噪声干扰造成的谱图质量较差时,传统加权点积方法由于不能有效利用谱峰分布规律和不同质量区间谱峰在谱图识别中的作用而易出现相似度较低、假阳性或假阴性等问题,提出了基于分段加权的点积相似度方法。首先利用谱峰的分布规律合理地确定各个质量区间的范围,依据不同的质量区间在谱图识别中的作用设置不同的质量数(mass/z)权重值和丰度权重值,然后按赋予的权重值计算谱图相似度,以甲基磷酸二甲酯谱图为例研究了合理确定各分段权重的方法。采用该方法在NIST08标准参考谱库(191 000张谱图)中检索1000多张谱图的准确性实验表明,与传统加权点积方法相比谱图识别的准确度提高了16.2%;通过质谱仪得到样品的实测谱图在参考谱库中检索结果表明,该方法能有效提高谱图匹配的相似度和准确度,与传统加强点积方法相比,不同浓度的八氟奈谱图相似度平均提高了2.3%;采用该方法处理同分异构体化合物,如邻二甲苯数据时,提高了相似化合物的选择性。 A method based on segment weighted dot-product was presented to solve the problems that the low similarity and false positive or false negative because of traditional method couldn't use the peak distribution rule and effect of different mass section in the identification process. It used the peak distribution rule to determine the sections of mass/z and according to the effect of different mass section in the identification to endow the weighted values, and then calculated the spectra similarity. Using DMMP spectra data researched the way to set the weighted value of different sections. In the accuracy experiment, searched a thousand spectra in the NIST 08 reference library (191,000 spectra) using the presented method, compared with weighted dot-product method, the accuracy was improved by 16.2 % in average. Searching the spectrum of samples through the spectrometry in the reference library, the results showed that it could improve the similarity and accuracy, and compared with that of weighted dot-product method, the similarity value of different concentration's octafluoronaphthalene spectrum was improved by 2.3 % in average. Using the presented method to process the isomer compound such as o-xylene could improve the selectivity of similar compound.
出处 《计算机与应用化学》 CAS CSCD 北大核心 2014年第1期24-28,共5页 Computers and Applied Chemistry
基金 国家重点基础研究发展计划(973计划)项目(2011 CB706900)资助
关键词 谱图检索 相似度计算 加权点积 分段加权 spectrum searching similarity calculation weighted dot-product segment weight
  • 相关文献


  • 1Mclafferty FW, Zhang MY, Stauffer D B, et al. Comparison of algorithms and databases for matching unknwon mass spectra. American Society for Mass Spectrometry, 1998, 9:92-95.
  • 2Gan F, Yang JH and Liang YZ. Library search of mass spectra with a new matching algorithm based on substructure similarity. Analytical Sciences, 2001, 17:635-638.
  • 3Gan F and Liang YZ. A novel approach of retrieval of mass spectrum of mixture. Analytical Sciences, 2000, 16:603-607.
  • 4Grotch SL. Improving identifications in the File Searching of Mass Spectra. Lubrication Engineering, 1974: 456.
  • 5Pan Du, Warren A. Kibbe, Simon M. Lin. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics, 2006: 2059-2065.
  • 6H S Hertz, Ronald A Hires and K Bicmann. Identification of mass spectra by computer-searching a file of known spectra. Analytical Chemistry, 1971, 5:681-691.
  • 7Mclafferty F W, Hertel R H and Villwock R D. Org Mass Spectrom, 1974, 9:690-702.
  • 8Pesya G M. Computerized Structure Retrieval and Interpretation of Mass Spectra:The Design and Evaluation of a Probability Based Matching System Using a Large Data Base. Doctoral Dissertation, Cornell University:Ithaca, NY, 1975.
  • 9Scongho Kim, Aiqin Fang, Bing Wang, et al. An optimal peak alignment for comprehensive two-dimensional gas chroma- tography mass spectrometry using mixture similarity measure, Bioinformatics, 2011,12:1660-1666.
  • 10Sokolow S, Kamofsky J and Gustfon P. The Finnigan Library Search Program. Finnigan Application Report 2, Finnigan Corp. San lose, CA, March 1978.











使用帮助 返回顶部