基于词性加权和单词相似性的蛋白质交互识别

Protein-protein Interaction Identification Based on POS Weighted and Word Similarity

下载PDF

导出

摘要与现有绝大多数以单个句子为依据的蛋白质自动识别方式不同,文中基于大规模语料库提出了引入句法和单词相似性这两个因素的蛋白质交互自动识别方法。首先,采用基于特征的方法对蛋白质对签名档进行分类。然后,使用分词工具对蛋白质对签名档进行词性标注,将不同词性的特征词语进行分组,并对每种词性进行加权。最后,基于大规模语料库的方法计算得到单词相似性,根据单词在正、负类中频率的差别调整单词相似性矩阵。实验结果表明,引入词性加权和单词相似性两个因素后,最终的分类结果较基准模型的识别精度有了明显的提升。 Be different from the existing vast majority of Protein-Protein Identification （PPI） based on a sentence ,in this paper,put for- ward a new PPI identification method that introduces syntax and word similarity based on large-scale corpus. First of all, feature-based method is used to classify the protein signature. Then, a segmentation tool is used to Part-Of-Speech （POS） tag protein signatures, so that,feature words based on different POS are grouped and different weights are assigned to each POS of words. Finally,word similarity is calculated through the method based on large-scale corpus and the word similarity matrix is adjusted by the difference in the frequen- cies between positive class and negative class. The experimental results show that once the weighted POS and word similarity are intro- duced,the final classification accuracy is obviously improved than the benchmark model.

作者吴红梅牛耘

机构地区南京航空航天大学计算机科学与技术学院

出处《计算机技术与发展》 2015年第12期6-9,共4页 Computer Technology and Development

基金国家自然科学基金资助项目(61202132 61170043)

关键词大规模语料库蛋白质交互词性加权单词相似性 large- scale corpus protein-protein interaction POS weightd word similarity

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Prasad T S K, Goel R, Kandasamy K, et al. Human protein ref-erence database- 2009 update [ J ]. Nucleic Acids Research, 2008,37:767-772.
2Kerrien S, Alam- Faruque Y, Aranda B, et al. IntAct - open source resource for molecular interaction data [ J ]. Nucleic Acids Research ,2007,35:561-565.
3Ceol A, Aryamontri A C, Licata L, et al. MINT, the molecular interaction database: 2009 update [ J ]. Nucleic Acids Re- search,2010,38( 1 ) :532-539.
4Qian W, Fu C, Cheng H. Semi-supervised method for extrac- tion of protein-protein interactions using hybrid model [ C ]// Proceedings of the 2013 third international conference on in- telligent system design and engineering applications. [ s. 1. ] : IEEE Computer Society,2013 : 1268-1271.
5Niu Y, Otasek D, Jurisica I. Evaluation of linguistic features useful in extraction of interactions from PubMed ; application to annotating known,high-throughput and predicted interactions in I2D [ J ]. Bioinformatics ,2010,26 ( 1 ) : 111 - 119.
6Haussler D. Convolution kernels on discrete structures [ R ]. California : University of California at Santa Cruz, 1999.
7Lodhi H, Saunders C, Shawe-Taylor J, et al. Text classification using string kernels [ J ]. Journal of Machine Learning Re- search, 2002,2 ( 3 ) :419-444.
8Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction [ C ]//Proceedings of the conference on human language technology and empirical methods in natural language processing. [ s. 1. ] : Association for Computational Linguistics ,2005:724-731.
9Hido S,Kashima H. A linear-time graph kernel[ C]//Proc of ninth IEEE international conference on data mining. [ s. 1. ] : IEEE.2009.179-188.
10封二英,牛耘,魏欧,蔡昕烨.基于关系相似性的蛋白质交互自动识别[J].计算机科学,2013,40(6):229-232. 被引量：4

二级参考文献32

1U. S. National Library of Medicine. PubMed[ EB/OL]. [2011 -08 -20]. http://www, ncbi. nlm. nih. gov/pubmed/.
2ONO T, HISHIGAKI H, TANIGAMII A, et al. Automatic extraction of information on protein-protein interactions from the biological liter- ature[ J]. Bioinformaties, 2001, 17(2) : 155 - 161.
3HUANG M L, ZHU X Y, HAO Y, et al. Discovering patterns to ex-tract protein-protein interactions from full texts[ J]. Bioinformat- ics, 2004, 20(18) : 3604 - 3612.
4FUNDEL K, KI3FFNER R, ZIMMER R. RelEx--Relation extrac- tion using dependency parse trees[J]. Bioinformatics, 2007,23(3): 365 - 371.
5TEMKIN J M, GILDER M R. Extraction of protein interaction infor- mation from unstructured text using a context-tree grammar[J] Bioinformatics, 2003, 19(16) : 2046 -2053.
6BUNESCU R C, MOONEY R J. Subsequence kernels for relation ex- traction[ C] /// Proceedings of the 19th Annual Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2005:171 - 178.
7NIU Y, OTASEK D, JURISICA I. Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to an- notating knona, high-throughput and predicted interactions in I2D [ J]. Bioinformatics, 2010, 26(1) : 111 - 119.
8U. S. National Library of Medicine. Bookshelf[ EB/OL]. [2011 -08 -25]. http://www, ncbi. nlm. nih. gov/books/NBK5500/#chap- terl.
9ESearch. University of Illinois at Urbana-Champaigu. Sentence segmentation tool [ EB/OL]. [ 2011 - 09 - 23]. http://cogcomp, cs. illinois, edu/ page/tools_view/2.
10NAKOV P, HEARST M A. Solving relational similarity problems u- sing the Web as a corpus[ C] // Proceedings of ACL-08: HLT. Co- lumbus, Ohio, USA: Association for Computational Linguistics, 2008:452 - 460.

共引文献6

1王宇伟,牛耘,魏欧.基于相似性混合模型的蛋白质交互识别[J].计算机工程,2015,41(7):25-30. 被引量：2
2吴红梅,牛耘.基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(2):114-117. 被引量：3
3蔡松成,牛耘.基于最大期望算法的蛋白质交互关系识别[J].计算机技术与发展,2018,28(8):48-52.
4孙坦,丁培,黄永文,鲜国建.文本挖掘技术在农业知识服务中的应用述评[J].农业图书情报学报,2021,33(1):4-15. 被引量：10
5唐詹,王龙鹤,郭旭超,周晗,刁磊,李林.基于深度学习的PPI关系抽取方法研究进展[J].计算机应用与软件,2023,40(5):1-9.
6曹焕男,李洪梅,吕晶晶,崔岩.农机调度管理系统设计——基于体育馆智慧物业管理系统[J].农机化研究,2020,42(6):220-223. 被引量：4

1吴红梅,牛耘.基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(2):114-117. 被引量：3
2封二英,牛耘,魏欧.基于大规模文本的蛋白质交互关系自动提取[J].计算机应用,2012,32(A01):147-150. 被引量：6
3王宇伟,牛耘,魏欧.基于相似性混合模型的蛋白质交互识别[J].计算机工程,2015,41(7):25-30. 被引量：2
4行花妮,刘刚,王磊.基于GN算法的快速算法在PPI网络中的实现[J].计算机与信息技术,2009(9):62-65. 被引量：5
5封二英,牛耘,魏欧,蔡昕烨.基于关系相似性的蛋白质交互自动识别[J].计算机科学,2013,40(6):229-232. 被引量：4
6李丽双,郭瑞,黄德根,周惠巍.基于迁移学习的蛋白质交互关系抽取[J].中文信息学报,2016,30(2):160-167. 被引量：5
7孟雅,尚学群,缪苗,王淼.基于不确定PPI网络的功能模块挖掘[J].计算机应用研究,2011,28(12):4481-4484.
8吕庆聪,李绍滋.面向人-物交互的视觉识别方法研究[J].计算机工程与设计,2012,33(8):3194-3199.
9环球[J].计算机应用文摘,2014(13):74-75.
10王宇伟,牛耘.基于关系相似性的蛋白质交互作用识别[J].计算机技术与发展,2015,25(2):42-46. 被引量：3

计算机技术与发展

2015年第12期

浏览历史

内容加载中请稍等...

基于词性加权和单词相似性的蛋白质交互识别

参考文献15

二级参考文献32

共引文献6

相关作者

相关机构

相关主题

浏览历史