摘要
【目的】对跨语言情感分析的研究脉络进行梳理总结。【文献范围】以Web of Science数据库为检索平台,以TS=cross lingual sentiment OR cross lingual word embedding为检索式,筛选90篇文献进行述评。【方法】根据跨语言情感分析所采用的技术进行分类概述,包括基于机器翻译及其改进、基于平行语料库、基于双语情感词典三种早期的主要方法,再到引入Word2Vec和GolVe等词向量模型后,基于跨语言词向量模型的方法,最后到2019年以来基于Multi-BERT等预训练模型的方法。【结果】总结跨语言情感分析相关研究的主要思路、方法模型、不足之处等,分析现有研究覆盖的语言、数据集及其性能。发现虽然Multi-BERT等预训练模型在零样本的跨语言情感分析上取得较好性能,但是仍然存在语言敏感性问题。早期的跨语言情感分析方法对现有研究仍有一定指导和参考价值。【局限】部分跨语言情感分析模型属于混合模型,分类时仅按照主要方法进行归纳。【结论】展望跨语言情感分析的未来发展和亟待解决的问题。随着预训练模型对多语言语义的深层次挖掘,适用于更多更广泛语种的跨语言情感分析模型将是未来发展方向。
[Objective] This paper teases out the research context of cross-lingual sentiment analysis(CLSA).[Coverage] We searched “TS=cross lingual sentiment OR cross lingual word embedding” in Web of Science database and 90 representative papers were chosen for this review. [Methods] We elaborated the following CLSA methods in detail:(1) The early main methods of CLSA, including those based on machine translation and its improved variants, parallel corpora or bilingual sentiment lexicon;(2) CLSA based on cross-lingual word embedding;(3) CLSA based on Multi-BERT and other pre-trained models. [Results] We analyzed their main ideas, methodologies, shortcomings, etc., and attempted to reach a conclusion on the coverage of languages,datasets and their performance. It is found that although pre-trained models such as Multi-BERT have achieved good performance in zero-shot cross-lingual sentiment analysis, some challenges like language sensitivity still exist. Early CLSA methods still have some inspirations for existing researches. [Limitations] Some CLSA models are mixed models and they are classified according to the main methods. [Conclusions] We look into the future development of CLSA and the challenges facing the research area. With in-depth research of pre-trained models on multi-lingual semantics, CLSA models fit for more and wider languages will be the future direction.
作者
徐月梅
曹晗
王文清
杜宛泽
徐承炀
Xu Yuemei;Cao Han;Wang Wenqing;Du Wanze;Xu Chengyang(School of Information Science and Technology,Beijing Foreign Studies of University,Beijing 100089,China)
出处
《数据分析与知识发现》
CSCD
北大核心
2023年第1期1-21,共21页
Data Analysis and Knowledge Discovery
基金
中央高校基本科研业务费专项资金(项目编号:2022JJ006)的研究成果之一。
关键词
跨语言
多语言
情感分析
双语词嵌入
Cross Lingual
Multi-lingual
Sentiment Analysis
Bilingual Word Embedding