摘要
针对评论情感分析任务中文本长度失衡引起的特征稀疏、特征缺失和提取信息不全等问题,提出了一种基于字句动态特征和自注意力的情感分析方法。首先基于预训练模型对评论进行动态特征编码,使用句向量对不足固定长度的部分进行补全,并表征超出的截断部分,以缓解批训练下文本尺寸失衡引起的特征稀疏和特征缺失问题。然后使用基于自注意力机制的特征重组方法动态整合字句融合特征,并优化权重参数以降低计算和训练时间复杂度。最后在开源数据集上分别进行了消融实验和对比实验。测试结果表明,本文方法在准确率上有较优的改进效果。
Traditional models suffer from feature sparsity,feature loss and incomplete comment feature extraction problems due to the imbalance of comment length.This paper proposes an emotional analysis approach based on dynamic word-sentence features and self-attention(DWSF-SA),to alleviate the incomplete extraction problem caused by the imbalance of text size under batch training.DWSF-SA first follows pre-training on dynamic feature embedding,then employs sentence vectors to complete the less parts and represents the truncated parts by fixed length.Moreover,DWSF-SA also introduces a selfattention mechanism to dynamically integrate the word-sentence fusion features,and makes optimization on the weight parameters to accelerate the computation and training.The ablation and comparison experiments on publicly available datasets demonstrate that the proposed DWSF-SA outperforms traditional approaches in accuracy metrics.
作者
刘强
朱金森
赵龙龙
沙宇晨
刘尚东
季一木
LIU Qiang;ZHU Jinsen;ZHAO Longlong;SHA Yuchen;LIU Shangdong;JI Yimu(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Institute of High Performance Computing and Big Data Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《数据采集与处理》
CSCD
北大核心
2024年第1期193-203,共11页
Journal of Data Acquisition and Processing
基金
国家重点研发计划专项(2018AAA0103300,2018AAA0103302)
江苏省自然科学及高校自然科学重大项目(BK20170900,20KJA520001)
江苏省创新创业人才项目及江苏博士后基金(2019K024)
江苏省六大人才高峰项目(JY02)
江苏省博士后研究实践创新项目(KYCX19_0921,KYCX19_0906)
之江实验室开放项目(2021KF0AB05)
教育部人文社会科学基金青年项目(20YJC880104)
南京邮电大学人才启动基金(NY219132)。
关键词
情感分析
特征编码
预训练模型
自注意力机制
权重参数
emotional analysis
feature embedding
pre-training model
self-attention mechanism
weight parameters