期刊文献+

RNA二级结构预测:基于半监督学习的随机文法模型方法 被引量:1

RNA secondary structure prediction:a method based on Semi-supervised learning Stochastic grammar model
原文传递
导出
摘要 传统随机文法模型预测RNA二级结构需要寻找足够多的相关序列样本,这限制了该方法的实际应用。为有效利用大量未标注的RNA序列进行结构预测,将半监督学习方法融入到随机文法模型中,采用少量已标注的RNA样本和大量未标注样本作为预测模型的训练集。设计了基于EM算法的半监督学习预测模型,该模型将基于产生式方法的SCFG模型作为分类器,通过训练对未标记的RNA序列进行标注,再将己标注的序列逐步合并到已标记样本集中,并能够调节已标记样本和未标记样本所占的比例,最后输出结构标签序列。实验结果表明,通过对多种混合了已标注和未标注RNA序列集的测试,验证了该方法可有效地利用未标注序列数据,大大降低了对已标注序列样本的需求数量,提高了预测精度,并测试了掺入不同的未标记序列数量对模型预测性能的影响。 To predict RNA secondary structures, traditional stochastic grammar models need to collect plenty of related RNA sequences, which limits the practical application of this method. In order to use a large number of unlabeled RNA sequences effectively for structure prediction, the Semi-supervised method has been applied to stochastic grammar models. We use a small amount of labeled RNA samples and a large number of unlabeled samples as a training set of prediction model. Designing a semi-supervised learning model based on EM algorithm, using a SCFG model based on generative method as classifier, we labeled the unlabeled RNA sequences through training, and then gradually merged into labeled Dataset. Moreover, the model can regulate the proportion of labeled and unlabeled sequences, finally It can output structure tags sequence. By experiment result show, through training variety of the mixture of RNA sequence set, this method can utilize unlabeled sequences data effectively, greatly reduces the demand for the number of related sequence samples, and improve the prediction accuracy. In addition, we had measured the performance of the model prediction influenced by different amount of unlabeled sequences.
出处 《计算机与应用化学》 CAS CSCD 北大核心 2013年第9期1038-1042,共5页 Computers and Applied Chemistry
基金 湖南省自然科学基金项目(12JJ4058) 衡阳师范学院科研基金项目(09A36)
关键词 半监督学习 RNA 二级结构 随机文法模型 Semi-supervised learning RNA secondary structure stochastic grammar model
  • 相关文献

参考文献4

二级参考文献57

共引文献7

同被引文献14

  • 1吴建祖.生物信息学分析实践[M].北京:科学出版社,2010.
  • 2BERMAN H M. The protein data bank: a historicalperspective [ J ]. Acta Crystallographica Section A:Foundations of Crystallography, 2007,64(1) : 88-95.
  • 3SATO K,HAMADA M, ASAI K,et al. Centroidfold: aweb server for RNA secondary structure prediction [ J ].Nucleic Acids Research, 2009’ 31 ( suppl 2) : W277 -W280.
  • 4HAMADA M, KIRYU H,SATO K,et al. Prediction ofRNA secondary structure using generalized centroidestimators[ J]. Bioinformatics, 2009, 25(4) : 465-473.
  • 5VOSS B,GIEGERICH R,REHMSMEIER M. Completeprobabilistic analysis of RNA shapes [ J ]. BMC Biology,2006,4(1) : 5.
  • 6STEFFEN P, VOSS B, REHMSMEIER M,et al.RNAshapes : an integrated RNA analysis package basedon abstract shapes [ J ]. Bioinformatics,2006,22 ( 4 ):500-503.
  • 7GIEGERICH R, VOSS B,REHMSMEIER M. Abstractshapes of RNA [ J ]. Nucleic Acids Research, 2004, 32(16) : 4843-4851.
  • 8JANSSEN S,GIEGERICH R. Faster computation of exactRNA shape probabilities [ J ]. Bioinformatics, 2010,26(5): 632-639.
  • 9REEDER J, GIEGERICH R. Consensus shapes: analternative to the sank off algorithm for RNA consensusstructure prediction[ J]. Bioinformatics, 2005,21 ( 17):3516-3523.
  • 10董浩,刘元宁,张浩,王刚.基于隐Markov模型的RNA二级结构预测新方法[J].计算机研究与发展,2012,49(4):812-817. 被引量:3

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部