摘要
为提高科技文献中各类型知识元抽取的效率,提出一种基于序列模式的科技文献知识元抽取模型。该方法首先在各类型知识元描述规则的基础上,对知识元语句进行依存句法分析,生成融入语义的知识元序列模式;其次,基于知识元序列模式,通过模式匹配算法实现各类型知识元的抽取;最后,基于Prefixspan算法,从抽取的知识元中挖掘出新的描述知识元的序列模式,以实现知识元序列模式的不断动态更新与扩展,进而依据知识元序列模式提高知识元的抽取效率。基于序列模式的科技文献中知识元的抽取方法,克服了基于规则匹配的语义局限,具有学科延展性,提高了知识元抽取的效率。
In order to improve the efficiency of the extraction of various types of knowledge elements(KEs)in the scientific literature.This paper proposes a knowledge model extraction model based on sequential patterns.Firstly,based on the description rules of each type of KEs,the KEs sentence is analyzed by dependency parsing,and to generate the semantic integrated KEs sequence pattern.Then,based on the KEs sequence pattern,each type of knowledge element is extracted by pattern matching algorithm.Finally,we use the Prefixspan algorithm to mining new new sequence pattern describing the KEs from the extracted KEs,and to realize the continuous dynamic update and expansion of the KEs sequence pattern,and then improve the knowledge element extraction efficiency.The method of extracting KEs in the scientific literature based on sequential patterns overcomes the semantic limitations based on rules matching,has discipline extensibility,and improves the efficiency of KEs extraction.
出处
《情报理论与实践》
CSSCI
北大核心
2020年第11期144-149,共6页
Information Studies:Theory & Application
基金
华中师范大学中央高校基本科研业务费项目“数字馆藏资源多粒度层级结构挖掘研究”的成果之一,项目编号:CCNU19TS043。
关键词
科技文献
序列模式
知识元
知识元抽取
依存句法
PREFIXSPAN
scientific and technical literature
sequence pattern
knowledge element
knowledge element extraction
dependency parse
Prefixspan