摘要
研究方法是科技文献中的重要内容,是解决学科领域问题的方法、工具、手段或技术。研究方法的描述通常以句子为单位。将分散在科技文献中的研究方法句进行汇总,可以辅助科研工作者快速地搜寻合适的研究方法。根据方法使用主体,将研究方法句进一步分为论文使用方法句和论文引用方法句。论文使用方法句是指论文中使用的研究方法的描述句。论文引用方法句是指论文对前人使用过的研究方法的描述句。本文使用多种基于神经网络的句子分类模型从科技文献全文本中进行研究方法句抽取。在模型词向量表示层,论文使用BERT和word2vec两种词向量模型。在模型的特征选择层,本文选用三种不同的网络,分别为卷积神经网络、双向长短时记忆网络和注意力机制网络。另外,论文使用两种模型训练方式,分别为单层次结构和两层次结构。实验结果表明,基于BERT的单层次结构的双向长短时记忆网络模型取得了较优的性能。本文从《情报学报》已发表论文中进行研究方法句的抽取并分析研究方法句的分布情况。分析发现,《情报学报》逐渐重视情报学中理论的发展并关注建设情报学学科的理论体系。
Research methods are essential in the scientific literature.These include methods,tools,or techniques for solving problems in the field.The research method's description is usually presented through sentences.Summarizing these scattered sentences in the scientific literature can help researchers to quickly explore appropriate research methods.According to the method's purpose in the research paper,the research method sentence is further divided into method used and method cited sentences.The method used sentence refers to the sentence that describes the research method used in the paper and the method cited sentence refers to that cited by the paper.In this study,a variety of neural network-based sentence classification models are used for extracting the method sentences from the scientific literature's full-text.At the word vector representation layer,the study uses two-word vector models:BERT and word2vec.In the feature selection layer,three different networks are utilized:convolutional neural network(CNN),bidirectional LSTM(BiLSTM),and attention mechanism network.In addition,the study uses two model training methods:a single-level structure and a two-level structure.The experimental results show that the BERT-based BiLSTM model with single-level structure achieves the best performance.This paper analyzes the distribution of research method sentences extracted from the Journal of The China Society for Scientific and Technical Information.The analysis indicates that this journal paid more attention to the theoretical developments of information science;in addition,the journal also focused on constructing theoretical systems for this discipline.
作者
张颖怡
章成志
Zhang Yingyi;Zhang Chengzhi(Department of Information Management,School of Economics and Management,Nanjing University of Science&Technology,Nanjing 210094)
出处
《情报学报》
CSSCI
CSCD
北大核心
2020年第6期640-650,共11页
Journal of the China Society for Scientific and Technical Information
基金
国家社会科学基金重大项目“情报学学科建设与情报工作未来发展路径研究”(17ZDA291)。
关键词
研究方法句抽取
信息抽取
深度学习
BERT
methodological sentence extraction
information extraction
deep learning
BERT