摘要
科研项目申请书蕴含丰富的科学知识,被广泛用作科技情报分析的基础数据,其中重复检测、分析挖掘等智能处理工作需要在明晰申请书结构功能的前提下展开。因此,构建一种基于多阶段分类的科研项目申请书结构功能识别模型。首先,对申请书进行预处理,识别申请书的正文内容及其包含的多模态要素,并将文本段落规范化;之后,基于BiLSTM-Attention模型,依次区分申请书中的章节标题与正文文本,基于标题识别正文文本的一级功能,进而识别申请书的细粒度结构功能。实验结果显示,所提方法的准确率与召回率分别达到93.7%和93.1%,该方法能较好支撑科研项目申请书的结构化解析,也能为其他类型学术文本的结构功能识别提供参考。
The research project applications contain rich scientific knowledge and are widely used as the basic data for scientific and technological information analyses.Some information analyses such as duplicate detection and analysis mining need to be carried out on the premise of clarifying the structure function of the applications.Therefore,this paper proposes a research project application structure function recognition model based on multi-stage classification.Firstly,the research project applications should be preprocessed,including identifying the main content and multimodal elements of the applications,and standardizing the text paragraphs.Afterwards,based on the BiLSTM-Attention model,the chapter titles and their text are distinguished,and the primary structure function is recognized based on the titles.Furtherly,the fine-grained structure function of the application is identified.The experiment shows that the precision and recall rate of the model reach 93.7%and 93.1%.The model can support the structured analysis of scientific research project applications and provide references for the structure function recognition of other types of academic texts.
作者
林鑫
杜莹
罗宇
LIN Xin;DU Ying;LUO Yu(School of Information Management,Central China Normal University,Wuhan 430079,P.R.China)
出处
《数字图书馆论坛》
2024年第3期25-33,共9页
Digital Library Forum
基金
国家社会科学基金项目“面向多模态发布的学术论文语义标注与对象链接研究”(编号:23BTQ083)资助。