摘要
针对传统基于机器学习方法在蛋白质互作用信息抽取中的缺陷,提出融合浅层句法分析的信息抽取方法,该方法将候选的句子进行浅层句法分析,包括对短语切分、同位语分析、并列结构分析、句子切分的处理。经过该步骤,句子被划分为多个单独的语法单元。然后,对每个语法单元采用基于最大熵的分类方法进行蛋白质互作用信息抽取。该方法在BC-PPI语料库中获得了62.1%的F1性能。比较实验结果表明,该方法能有效减少误判和漏判,提高信息抽取的性能。
In order to solve problems of protein-protein interaction extraction based on traditional machine learning methods,this paper proposed an information extraction method using shallow parsing.This method first processed candidate sentences by shallow parsing including phrase chunking,appositive parsing,coordinative parsing and sentence splitting.After this step,divided sentences into multiple individual grammar units.Secondly,extracted protein-protein interactions from each unit using maximum entropy classification method.Tested in the BC-PPI corpus,this method achieved F1 value of 62.1%.Comparative experiments show the method decreases false positives and false negatives efficiently and improves performances of information extraction.
出处
《计算机应用研究》
CSCD
北大核心
2011年第3期972-975,共4页
Application Research of Computers
基金
国家高科技发展规划项目(2006AA01Z411)
关键词
蛋白质互作用
信息抽取
浅层句法分析
最大熵
protein-protein interaction(PPI)
information extraction
shallow parsing
maximum entropy