期刊文献+

基于多层功能结构的谷物蛋白质功能预测 被引量:1

Prediction of cereal protein function based on multilayer functional structures
下载PDF
导出
摘要 为使研究人员可以更加便捷、准确地选择功能蛋白质,更高效完成谷物功能性食品的研发与创新,该研究提出基于多层功能结构的谷物蛋白质功能预测方法。该研究首先构建多种谷物数据共建的大规模相互作用网络,通过集群的功能特征与未知蛋白的交互作用探寻未知蛋白的相关功能;其次,定义新的蛋白质权重与语义相似度、功能层级权重来确定蛋白质可能具有的功能;最后,通过评分机制辅助完成谷物蛋白质功能的预测结果的判定。试验结果表明,该研究提出的预测方法使预测的功能具有层级性的特点,并且可获得指定功能蛋白质;对功能类别FunCat(functional catelogue)前二层的谷物蛋白质功能预测平均准确率达到85%以上,且能完成对蛋白质的第五层、第六层功能的预测;层级结构的可回溯性使得预测结果差的功能返回至上层功能,并达到降低假阳性的概率、提高算法整体的预测准确率的效果。该研究结果可为功能类食品、药品的研发提供参考。 Cereals are very valuable food sources of healthy and sustainable protein.Food innovations in cereal protein are ever transitioning to more sustainable food systems for healthy diets.A more precise understanding is required by the functions that cereal proteins have.The application of cereal proteins has greatly contributed to genomics and food science today.In this study,a functional prediction was proposed for the cereal proteins using a multilayer functional structure,in order to select the functional proteins more conveniently and accurately.A large-scale interaction network was also constructed with the indica,japonica,wheat,maize,and soybean data.Firstly,the relevant functions of unknown proteins were explored via the interaction of functional features of clusters with the unknown proteins.Secondly,new protein weights,semantic similarity,and functional hierarchy weights were defined to determine the possible functions of proteins.Finally,the grain protein function was further determined using a scoring mechanism in the prediction of the function.The results show that better performance was achieved to predict the function of cereal proteins,particularly with a precision of about 77%for the accurate protein function prediction and up to 92%for the fuzzy protein function prediction using retraceability.A great contribution was made to determine the functional range of unknown proteins,especially with the high efficiency of prediction.The precision of protein function prediction varied significantly at different levels,with an average precision of 92%at level-1,85%at level-2,and 69%at the level-4.More importantly,the average precision was close to 80%in all six levels of FunCat.As such,the multi-layer functional structure of proteins was predicted to calculate the number of unknown proteins with different sizes.The precision of the prediction was 76%at an unknown protein size of 50,72%at an unknown protein number of 100,and 66%at an unknown protein number of 200.There was no sharp decrease with the significant increase in the prediction size.It infers that the prediction still performed the best in the case of large-scale unknown proteins.A comparison was made with the latest algorithms,such as FUNPRED_SEQSIN,DAC(Diffusion Alignment Coefficient),and PILL(Predict protein function using Incomplete hierarchical LabeLs).In terms of precision,recall,and F-measured,the performance of the improved prediction was significantly better than the others.The experimental results show that 1)the prediction can be expected to serve as the predicted function hierarchical,particularly for the protein with the specified function,or the available protein functions of specified functional levels;2)The average precision of the cereal protein function in the first four layers of FunCat(Functional Catelogue)can reach more than 80%,even to realize the prediction of the fifth and sixth layers of the protein;3)The retrospective nature of the hierarchy can allow the functions with the low predictions to be returned to the higher level functions.As such,the probability of false positives was reduced to improve the overall prediction accuracy.The finding can also provide a strong reference to the protein function prediction in the food industry.
作者 沈婷婷 刘静 管骁 SHEN Tingting;LIU Jing;GUAN Xiao(College of Information Engineering,Shanghai Maritime University,Shanghai 201306,China;School of School of Health Scienceand Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;National Grain Industry(Urban Grain and Oil Security)Technology Innovation Center,Shanghai 200093,China)
出处 《农业工程学报》 EI CAS CSCD 北大核心 2023年第1期261-268,共8页 Transactions of the Chinese Society of Agricultural Engineering
基金 国家自然科学基金项目(32172247) 内蒙古自治区科技重大专项“燕麦新品种选育、绿色栽培技术与营养功能产品研究与示范”(2021ZD0002)。
关键词 蛋白质 功能 预测 谷物 蛋白质语义 层级功能蛋白 蛋白质相互作用网络 protein function prediction cereals protein semantics hierarchical functional proteins protein-protein interaction network
  • 相关文献

参考文献4

二级参考文献51

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2闵顺耕,李宁,张明祥.近红外光谱分析中异常值的判别与定量模型优化[J].光谱学与光谱分析,2004,24(10):1205-1209. 被引量:117
  • 3王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量:94
  • 4Deufemia V, Risi M, Tortora G. Sketached symbol recognition using latent-dynamic conditional random fields and distance-based clustering[J]. Pattern Recognition, 2014, 47(3): 1159-1171.
  • 5Portela N M, Cavalcanti G, Ren T I. Semi-supervised clustering for MR brain image segmentation[J]. Expert Systems with Applications, 2014, 41(4) : 1492- 1497.
  • 6Voevodski K, Balcan M F, R6glin H, et al. Active clustering of biological sequences[J]. Journal of Machine Learning Research, 2012, 13, 203-225.
  • 7Rudi L. A fast quartet tree heuristic for hierarchical clustering[J]. Pattern Recognition, 2011, 44 (3): 662-677.
  • 8Su M C, Chou C H. A modified version of the k-means algorithm with distance based on cluster symmetry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 674-680.
  • 9Celebi M E, Kingravi H A, Vela P A. A comparative study of efficient initialization methods for the k- means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1): 200-210.
  • 10Elkan C. Using the triangle inequality to accelerate k- means[C]. Proceedings of the Twentieth Internation- al Conference on Machine Learning (ICML2003), Menlo Park, AAAI Press, 2003: 147-153.

共引文献43

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部