摘要
从生物医学文献中自动地抽取蛋白质相互作用(PPI)关系是文本挖掘的一项重要任务。考虑到特征和分类器的选择对于PPI任务的重要性,提出一种基于丰富特征和多分类器融合的蛋白质关系抽取方法。选取15种词法、句法及语义特征,融合3种分类器,采用文档级别的10倍交叉验证方法,在5个公开的PPI基准语料上进行评估实验,结果表明,该方法在AIMed语料上取得的F值和AUC值分别为63.7%和87.8%,具有良好的抽取性能。
Automatically extracting Protein-protein Interaction(PPI) from biomedical literature is a significant task in text mining. Considering the choice of features and the selection of classifier is of great importance for Protein-protein Interaction Extraction(PPIE) task, this paper proposes a method to combine various features and multiple classifiers, Fifteen lexical, syntactic and semantic features, three kinds of classifiers and the standard ten-fold document level crossvalidation evaluation method are used to evaluate on the five public PPI corpuses, and results show that the method achieves the preferable F-score(63.7% ) and AUC-score( 87. 8% ) on the AIMed corpus which is on the top of the PPI extraction task, and it has better extraction performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第11期207-212,共6页
Computer Engineering
基金
国家自然科学基金资助项目(61340020)
关键词
蛋白质相互作用关系抽取
丰富特征
支持向量机
最大熵
图核
Protein-protein Interaction Extraction ( PPIE )
rich features
Support Vector Machine ( SVM )
maximum entropy
graph kernel