期刊文献+

基于特征加权的蛋白质交互识别 被引量:3

Identification of Protein-protein Interaction Based on Feature Weighted
下载PDF
导出
摘要 在以单词为特征的模型中,如果特征单词在不同类别中的使用情况存在明显差异,那么它对分类有着很重要的影响。因此文中基于大规模语料库,研究不同的特征加权方法对PPI识别的影响。首先,通过搜索医学文献数据库建立蛋白质对的签名档,以单词作为描述蛋白质对关系的特征,构建向量空间模型;然后,选择不同的加权方法描述单词重要性;最后,以K近邻和SVM分类方法构建分类器判断蛋白质对是否存在交互关系。实验结果表明,根据特征向量单词的重要性进行加权,PPI识别精确度、召回率和准确率有了明显的提高。 In a model characterized by word,if the use of feature word in different categories exists obvious differences,it will have a very important impact on classification. Based on a large- scale corpus,study the effects of different methods of feature weighting on protein- protein interaction identification. Firstly,the signature of a protein pair is obtained by searching large scale biomedical text. Taking the words as the features which describe the relationship between the protein pair,construct Vector Space Model( SVM). Then,select different weighting methods to describe the importance of words. Finally,K nearest neighbor and SVMclassifier are applied to identify PPIs.According to the experimental results,PPI recognition accuracy and recall and precision have been significantly improved when the feature vectors are weighted.
作者 吴红梅 牛耘
出处 《计算机技术与发展》 2016年第2期114-117,123,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61202132 61170043)
关键词 蛋白质交互 大规模语料 特征加权 K近邻 支持向量机 protein-protein interaction large-scale corpus feature weighted K nearest neighbor SVM
  • 相关文献

参考文献22

  • 1Prasad T S K, Goel R, Kandasamy K, et al. Human protein ref- erence database-2009 update [ J ]. Nucleic Acids Research, 2009,37:767-772.
  • 2Kerrien S, Alam- Faruque Y, Aranda B, et al. IntAct- open source resource for molecular interaction data [ J ]. Nucleic Acids Research ,2007,35:561-565.
  • 3Ceol A, Aryamontri A C, Licata L, et al. MINT, the molecular interaction database: 2009 update [ J ]. Nucleic Acids Re- search, 2010,38 : 532 -539.
  • 4Bunescu R, Mooney R, Ramani A, et al. Integrating co-occur- rence statistics with information extraction for robust retrieval of protein interactions from Medline [ C ]//Proceedings of the workshop on linking natural language processing and biology: towards deeper biological literature analysis. [ s. 1.] :Associa- tion for Computational Linguistics,2006:49-56.
  • 5Koike A, Kobayashi Y, Takagi T. Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource [ J ]. Genome Research,2003,13 : 1231 - 1243.
  • 6杨志豪,洪莉,林鸿飞,李彦鹏.基于支持向量机的生物医学文献蛋白质关系抽取[J].智能系统学报,2008,3(4):361-369. 被引量:20
  • 7崔宝今,林鸿飞,张霄.基于半监督学习的蛋白质关系抽取研究[J].山东大学学报(工学版),2009,39(3):16-21. 被引量:12
  • 8Grimes G R, Wen T Q, Mewissen M,et al. PDQ Wizard:auto- mated prioritization and characterization of gene and protein lists using biomedical literature [ J ]. Bioinformatics, 2006,22 (16) :2055-2057.
  • 9Ananiadou S, Kell D B, Tsujii J. Text mining and its potential applications in systems biology [ J ]. Trends in Biotechnology, 2006,24(12) :571-579.
  • 10Fundel K ,Kttffner R, Zimmer R. RelEx-relation extraction u- sing dependency parse trees [ J ]. Bioinformatics, 2007,23 (3) :365-371.

二级参考文献71

  • 1CHAPELLE O, SCHOLKOPF B, ZIEN A.Semi-supervised learning[M]. Cambridge MA: M1T Press, 2006.
  • 2BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory. New York: ACM Press, 1998: 92-100.
  • 3DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1):1-38.
  • 4JOACHIMS T. Transductive inference for text classification using support vector machines[C]//Proceedings of the 16th International Conference on Machine Learning. San Fransisco: [s.n.], 1999: 200-209.
  • 5BELKIN M, MATVEEVA I, NIYOGI P. Regression and regularization on large graphs[C]//Proeeodings of the 17th Annual Conference on Learning Theory. New York: ACM Press, 2004: 185-192.
  • 6YAROWSKY D. Unsupervised word sense disambiguation rivaling supervised methods [ C ]// Proceedings of 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA: MIT Press, 1995: 189-196.
  • 7COLLINS M, YORAM S. Unsupervised models for named entity classification[ C]// Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park, MD: [s.n. ], 1999.
  • 8NIGAM K, GHANI R. Analyzing the effectiveness and applicability of co-training[ C ]//9th International Conference on Information and Knowledge Management. McLean, Virginia: [ s. n. ], 2000: 86-93.
  • 9ZHOU Zhihua, LIMing. Tri-training: exploiting unlabeled data using three classifiers[ J ]. IEEE Tram on Knowledge and Data Engineering, 2005, 17(11) : 1529-1541.
  • 10EUGENE C. Immediate-head parsing for language models [ C]// Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. San Francisco: Morgan Kaufmann Publishers, 2001.

共引文献27

同被引文献26

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部