摘要
[目的/意义]研究并构建基于深度学习的智能分类标引系统,并对文献数字资源进行正确的分类标引,以期降低文献分类标引过程中的人工成本。[方法/过程]首先,通过对比分析BERT-Base模型、贝叶斯算法、Text-CNN算法、对抗训练算法、IndRNN算法、LSTM算法这6种模型或算法对经济类文献数字资源分类的影响,发现BERT-Base模型的分类准确率最高。其次,选取艺术类、金属学与金属工艺类、医药卫生类的文献数字资源进行验证,BERT-Base模型的分类表现均较好,满足通用性要求。最后,采用BERT-Base中文预训练模型,构建文献数字资源一级大类分类模型,对模型进行预训练和文献分类研究,实现了一级大类分类测试总体准确率为90.44%。[结果/结论]基于BERT-Base中文预训练模型的深度学习算法能显著提高文献数字资源的分类效果,且在多类目大规模训练集下更能体现其分类的优越性。
[Purpose/significance]This paper studies and constructs the intelligent classification and indexing system based on deep learning,and makes correct classification and indexing to reduce the manual cost in the process of document classification and indexing.[Method/process]Firstly,by comparing and analyzing the influence of six algorithms including BERTBase model on the classification of economic literature digital resources,we find that it has the highest classification accuracy.Secondly,art,metallurgy,metal technology,medicine and health are selected to verify the classification performance of BERT-Base model,which is proved good and can meet the general requirements.Finally,the first-level classification model is constructed based on BERT-Base Chinese pre-training model. It is pre-trained and the literature classification is studied. The overall accuracy of the first-level classification test based on the BERT-Base Chinese pre-training model is 90.44%. [Result/ conclusion] Therefore, the deep learning algorithm based on BERT-Base Chinese pre-training model can improve the classification effect of document digital resources, and it can embody the superiority of classification under the large-scale multi-category training set.
作者
王静
姜鹏
沈立力
Wang Jing;Jiang Peng;Shen Lili(Shanghai Library(Shanghai Institute of Science and Technology Information),Shanghai 200031,China)
出处
《图书情报研究》
2023年第4期43-48,64,共7页
Library and Information Studies
基金
上海图书馆青年杨帆计划专项“基于深度学习的文献数字资源智能分类标引研究与应用”的研究成果之一。
关键词
深度学习
BERT
文献分类
数字资源
deep learning
BERT
literature classification
digital resource