摘要
针对现有跨领域情感分类方法中文本表示特征忽略了重要单词与句子的情感信息,且在迁移过程中存在负面迁移的问题,提出一种将文本表示学习与迁移学习算法相结合的跨领域情感分类方法。首先,利用低维稠密的词向量对文本进行初始化,通过分层注意力网络,对文本中重要单词与句子的情感信息进行建模,从而学习源领域与目标领域的文档级分布式表示。随后,采用类噪声估计方法,对源领域中的迁移数据进行检测,剔除负面迁移样例,挑选高质量样例来扩充目标领域的训练集。最后,训练支持向量机对目标领域文本进行情感分类。在大规模公开数据集上进行的两个实验结果表明,与基准方法相比,所提方法的均方根误差分别降低1.5%和1.0%,说明该方法可以有效地提高跨领域情感分类性能。
Most of existing cross-domain sentiment classification methods are not expressive enough to capture rich representation of texts,and class noise accumulated during transfer process would lead to negative transfer which could adversely affect performance.To address these issues,the authors propose a method combining textual representation learning and transfer learning algorithm for cross-domain sentiment classification.This method first builds a hierarchical attention network to generate document representations with local semantic information.Afterwards,the authors utilize the class-noise estimation algorithm to detect the negative transfer samples in transferred samples and remove them.Finally,the sentiment classifier is trained on the expanded dataset from samples in target domain and transferred ones in source domain.Compared with the baselines,two experiments on large-scale product review datasets show that the proposed method is able to effectively reduce RMSE of crossdomain sentiment classification by 1.5%and 1.0%respectively.
作者
廖祥文
吴晓静
桂林
黄锦辉
陈国龙
LIAO Xiangwen;WU Xiaojing;GUI Lin;HUANG Jinhui;CHEN Guolong(School of Mathematics and Computer Science,Fuzhou University,Fuzhou 350116;Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing(Fuzhou University),Fuzhou,350116;Fujian Provincial Key Laboratory of Information Processing and Intelligent Control(Minjiang University),Fuzhou,350116;Department of Systems Engineering and Engineering Management,The Chinese University of Hong Kong,Hong Kong)
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2019年第1期37-46,共10页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金(61772135
U1605251)
中国科学院网络数据科学与技术重点实验室开放基金课题(CASNDST201708
CASNDST201606)
可信分布式计算与服务教育部重点实验室主任基金(2017KF01)
赛尔网络下一代互联网技术创新项目(NGII20160501)资助
关键词
文本表示学习
迁移学习
类噪声估计
跨领域
情感分类
textual representation learning
transfer learning
class-noise estimation
cross-domain
sentiment classification