期刊文献+

基于词性加权和单词相似性的蛋白质交互识别

Protein-protein Interaction Identification Based on POS Weighted and Word Similarity
下载PDF
导出
摘要 与现有绝大多数以单个句子为依据的蛋白质自动识别方式不同,文中基于大规模语料库提出了引入句法和单词相似性这两个因素的蛋白质交互自动识别方法。首先,采用基于特征的方法对蛋白质对签名档进行分类。然后,使用分词工具对蛋白质对签名档进行词性标注,将不同词性的特征词语进行分组,并对每种词性进行加权。最后,基于大规模语料库的方法计算得到单词相似性,根据单词在正、负类中频率的差别调整单词相似性矩阵。实验结果表明,引入词性加权和单词相似性两个因素后,最终的分类结果较基准模型的识别精度有了明显的提升。 Be different from the existing vast majority of Protein-Protein Identification (PPI) based on a sentence ,in this paper,put for- ward a new PPI identification method that introduces syntax and word similarity based on large-scale corpus. First of all, feature-based method is used to classify the protein signature. Then, a segmentation tool is used to Part-Of-Speech (POS) tag protein signatures, so that,feature words based on different POS are grouped and different weights are assigned to each POS of words. Finally,word similarity is calculated through the method based on large-scale corpus and the word similarity matrix is adjusted by the difference in the frequen- cies between positive class and negative class. The experimental results show that once the weighted POS and word similarity are intro- duced,the final classification accuracy is obviously improved than the benchmark model.
作者 吴红梅 牛耘
出处 《计算机技术与发展》 2015年第12期6-9,共4页 Computer Technology and Development
基金 国家自然科学基金资助项目(61202132 61170043)
关键词 大规模语料库 蛋白质交互 词性加权 单词相似性 large- scale corpus protein-protein interaction POS weightd word similarity
  • 相关文献

参考文献15

  • 1Prasad T S K, Goel R, Kandasamy K, et al. Human protein ref-erence database- 2009 update [ J ]. Nucleic Acids Research, 2008,37:767-772.
  • 2Kerrien S, Alam- Faruque Y, Aranda B, et al. IntAct - open source resource for molecular interaction data [ J ]. Nucleic Acids Research ,2007,35:561-565.
  • 3Ceol A, Aryamontri A C, Licata L, et al. MINT, the molecular interaction database: 2009 update [ J ]. Nucleic Acids Re- search,2010,38( 1 ) :532-539.
  • 4Qian W, Fu C, Cheng H. Semi-supervised method for extrac- tion of protein-protein interactions using hybrid model [ C ]// Proceedings of the 2013 third international conference on in- telligent system design and engineering applications. [ s. 1. ] : IEEE Computer Society,2013 : 1268-1271.
  • 5Niu Y, Otasek D, Jurisica I. Evaluation of linguistic features useful in extraction of interactions from PubMed ; application to annotating known,high-throughput and predicted interactions in I2D [ J ]. Bioinformatics ,2010,26 ( 1 ) : 111 - 119.
  • 6Haussler D. Convolution kernels on discrete structures [ R ]. California : University of California at Santa Cruz, 1999.
  • 7Lodhi H, Saunders C, Shawe-Taylor J, et al. Text classification using string kernels [ J ]. Journal of Machine Learning Re- search, 2002,2 ( 3 ) :419-444.
  • 8Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction [ C ]//Proceedings of the conference on human language technology and empirical methods in natural language processing. [ s. 1. ] : Association for Computational Linguistics ,2005:724-731.
  • 9Hido S,Kashima H. A linear-time graph kernel[ C]//Proc of ninth IEEE international conference on data mining. [ s. 1. ] : IEEE.2009.179-188.
  • 10封二英,牛耘,魏欧,蔡昕烨.基于关系相似性的蛋白质交互自动识别[J].计算机科学,2013,40(6):229-232. 被引量:4

二级参考文献32

  • 1U. S. National Library of Medicine. PubMed[ EB/OL]. [2011 -08 -20]. http://www, ncbi. nlm. nih. gov/pubmed/.
  • 2ONO T, HISHIGAKI H, TANIGAMII A, et al. Automatic extraction of information on protein-protein interactions from the biological liter- ature[ J]. Bioinformaties, 2001, 17(2) : 155 - 161.
  • 3HUANG M L, ZHU X Y, HAO Y, et al. Discovering patterns to ex-tract protein-protein interactions from full texts[ J]. Bioinformat- ics, 2004, 20(18) : 3604 - 3612.
  • 4FUNDEL K, KI3FFNER R, ZIMMER R. RelEx--Relation extrac- tion using dependency parse trees[J]. Bioinformatics, 2007,23(3): 365 - 371.
  • 5TEMKIN J M, GILDER M R. Extraction of protein interaction infor- mation from unstructured text using a context-tree grammar[J] Bioinformatics, 2003, 19(16) : 2046 -2053.
  • 6BUNESCU R C, MOONEY R J. Subsequence kernels for relation ex- traction[ C] /// Proceedings of the 19th Annual Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2005:171 - 178.
  • 7NIU Y, OTASEK D, JURISICA I. Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to an- notating knona, high-throughput and predicted interactions in I2D [ J]. Bioinformatics, 2010, 26(1) : 111 - 119.
  • 8U. S. National Library of Medicine. Bookshelf[ EB/OL]. [2011 -08 -25]. http://www, ncbi. nlm. nih. gov/books/NBK5500/#chap- terl.
  • 9ESearch. University of Illinois at Urbana-Champaigu. Sentence segmentation tool [ EB/OL]. [ 2011 - 09 - 23]. http://cogcomp, cs. illinois, edu/ page/tools_view/2.
  • 10NAKOV P, HEARST M A. Solving relational similarity problems u- sing the Web as a corpus[ C] // Proceedings of ACL-08: HLT. Co- lumbus, Ohio, USA: Association for Computational Linguistics, 2008:452 - 460.

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部