摘要
引文意图自动分类是文献计量领域的重要问题,现有的引文意图分类模型存在对文本特征抽取能力有限、无法融合引文上下文特征和引文外部特征的问题.因此,文中提出基于MPNet预训练和多头注意力特征融合的引文意图分类方法.引入位置补偿结构,改善掩码语言模型与排列语言模型存在的缺陷.联合引文的语法词频特征与引文结构特征,提出适用于引文意图分类任务的特征抽取方法.再引入多头注意力机制进行特征融合,提升分类效果.在ACL-ARC数据集上的实验表明,文中方法在引文意图分类任务上性能较优,同时还具有在不平衡数据上的鲁棒性.
Automatic citation intent classification is one of hot issues in the field of bibliometrics.The existing citation intention classification models engender the limitations in extracting textual features and fusing citation contextual features and citation external features.Therefore,a citation intent classification method based on MPNet pretraining and multi-head attention feature fusion is proposed.The position compensation structure is introduced to improve the masked language model and permuted Language model.The syntactic word-frequency features and structure features of citations are combined.A feature extraction method is proposed for citation intent classification task.The multi-head attention mechanism is introduced for feature fusion to improve the classification accuracy.The experimental results on ACL-ARC datasets demonstrate that the proposed method achieves better performance in citation intent classification task with robustness on the unbalanced data.
作者
祁瑞华
邵震
关菁华
郭旭
QI Ruihua;SHAO Zhen;GUAN Jinghua;GUO Xu(Research Center for Language Intelligence,Dalian University of Foreign Languages,Dalian 116044;School of Software,Dalian University of Foreign Languages,Dalian 116044)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2022年第9期849-857,共9页
Pattern Recognition and Artificial Intelligence
基金
国家社会科学基金项目(No.15BYY028)
辽宁省高等学校创新人才项目(No.WR2019005)
大连外国语大学研究创新团队项目(No.2016CXTD06)资助。
关键词
引文意图分类
特征融合
预训练模型
特征抽取
多头注意力机制
Citation Intent Classification
Feature Fusion
Pretraining Model
Feature Extraction
Multi-head Attention Mechanism