摘要
确定蛋白质的功能可为许多生物学问题的解决提供支持,目前已提出了多种机器学习方法来对蛋白质的功能进行预测.大多数方法利用蛋白质序列、结构域以及蛋白质间的相互作用等特征来预测蛋白质功能,然而对于新发现的蛋白质来说,除序列外其他特征很难获得,因此仅通过序列对蛋白质进行功能预测的方法显得十分有意义.为此提出了一种基于序列对蛋白质进行功能预测的模型,该模型在对比学习的框架下进一步挖掘蛋白质序列的信息,并且还有效利用了蛋白质功能标签之间存在的共现关系.实验结果表明提出的模型能提高蛋白质功能的预测效果良好.
Determining the function of proteins can provide significant support for resolving numerous biological problems,and several machine learning methods have been proposed to predict protein function,primarily utilizing features such as protein sequence,structural domains,and protein-protein interactions.However,for newly discovered proteins,obtaining features other than the sequence can be difficult.Therefore,sequence-based methods for protein function prediction hold significant value.To this end,this paper proposes a model for predicting sequence-based protein function.The model further explores the information of protein sequences within the framework of contrastive learning and also effectively leverages the co-occurrence relationship between protein functional labels.Experimental results demonstrate the excellent predictive perform-ance of the proposed model for protein function.
作者
孙旭
林劼
SUN Xu;LIN Jie(School of Mathematics and Statistics,Fujian Normal University,Fuzhou 350117,China;College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117,China)
出处
《福建师范大学学报(自然科学版)》
CAS
2023年第6期32-39,共8页
Journal of Fujian Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(61472082)。
关键词
对比学习
基因本体
标签传播
contrastive learning
gene ontology
label propagation