摘要
[目的/意义]针对领域学术文献,基于题录信息构建按照“研究内容”与“研究方法”的双标签分类模型,为学术文献的细粒度分类提供方法借鉴。[方法/过程]以深度学习中卷积神经网络为基础模型,将题名、摘要、关键词、刊名、作者、机构等题录信息分为显性特征和隐性特征,通过显性特征提取、隐性特征映射等步骤,形成特征词数组,在此基础上生成词向量矩阵,经过卷积层、池化层与Softmax层处理,完成分类任务。[结果/结论]以电子商务领域文献为例进行实验验证,结果显示,该模型按“研究内容”与“研究方法”双标签分类的宏F1值分别为0.74、0.81,不仅明显优于传统机器学习方法,也比仅使用显性特征的深度学习分类方法高。
,keyword,source,author,organ and other bibliographies were divided into dominant feature and invisible feature.Through dominant feature extraction,invisible feature mapping and other steps,a feature word array was formed.On this basis,the word vector matrix was constructed,which processed by the convolutional layer,pooling layer and Softmax layer to complete the classification task.[Result/conclusion]Take the literature in the e-commerce field as an example for experimental verification.The results show that the macro F1 values of this model are 0.74 and 0.81 respectively according to the two categories of"research content"and"research method".The classification results are not only significantly better than traditional machine learning methods,but also higher than deep learning classification methods that only use dominant feature.
作者
雷兵
刘小
钟镇
Lei Bing;Liu Xiao;Zhong Zhen(School of Management,Henan University of Technology,Zhengzhou 450001;Business Intelligence and Knowledge Engineering Laboratory,Henan University of Technology,Zhengzhou 450001)
出处
《图书情报工作》
CSSCI
北大核心
2021年第14期128-137,共10页
Library and Information Service
基金
国家自然科学基金项目"作者、期刊与数据库错误引文的科学计量学研究:识别方法、产生机理与抑控对策"(项目编号:71603073)
河南省高校哲学社会科学创新团队资助项目"大数据与管理决策"(项目编号:2019-CXTD-04)研究成果之一。
关键词
学术文献
主题分类
题录信息
深度学习
卷积神经网络
academic literature
subject classification
bibliographies
deep learning
convolutional neural network