期刊文献+

跨语言情感分析研究综述 被引量:3

Cross-Lingual Sentiment Analysis: A Survey
原文传递
导出
摘要 【目的】对跨语言情感分析的研究脉络进行梳理总结。【文献范围】以Web of Science数据库为检索平台,以TS=cross lingual sentiment OR cross lingual word embedding为检索式,筛选90篇文献进行述评。【方法】根据跨语言情感分析所采用的技术进行分类概述,包括基于机器翻译及其改进、基于平行语料库、基于双语情感词典三种早期的主要方法,再到引入Word2Vec和GolVe等词向量模型后,基于跨语言词向量模型的方法,最后到2019年以来基于Multi-BERT等预训练模型的方法。【结果】总结跨语言情感分析相关研究的主要思路、方法模型、不足之处等,分析现有研究覆盖的语言、数据集及其性能。发现虽然Multi-BERT等预训练模型在零样本的跨语言情感分析上取得较好性能,但是仍然存在语言敏感性问题。早期的跨语言情感分析方法对现有研究仍有一定指导和参考价值。【局限】部分跨语言情感分析模型属于混合模型,分类时仅按照主要方法进行归纳。【结论】展望跨语言情感分析的未来发展和亟待解决的问题。随着预训练模型对多语言语义的深层次挖掘,适用于更多更广泛语种的跨语言情感分析模型将是未来发展方向。 [Objective] This paper teases out the research context of cross-lingual sentiment analysis(CLSA).[Coverage] We searched “TS=cross lingual sentiment OR cross lingual word embedding” in Web of Science database and 90 representative papers were chosen for this review. [Methods] We elaborated the following CLSA methods in detail:(1) The early main methods of CLSA, including those based on machine translation and its improved variants, parallel corpora or bilingual sentiment lexicon;(2) CLSA based on cross-lingual word embedding;(3) CLSA based on Multi-BERT and other pre-trained models. [Results] We analyzed their main ideas, methodologies, shortcomings, etc., and attempted to reach a conclusion on the coverage of languages,datasets and their performance. It is found that although pre-trained models such as Multi-BERT have achieved good performance in zero-shot cross-lingual sentiment analysis, some challenges like language sensitivity still exist. Early CLSA methods still have some inspirations for existing researches. [Limitations] Some CLSA models are mixed models and they are classified according to the main methods. [Conclusions] We look into the future development of CLSA and the challenges facing the research area. With in-depth research of pre-trained models on multi-lingual semantics, CLSA models fit for more and wider languages will be the future direction.
作者 徐月梅 曹晗 王文清 杜宛泽 徐承炀 Xu Yuemei;Cao Han;Wang Wenqing;Du Wanze;Xu Chengyang(School of Information Science and Technology,Beijing Foreign Studies of University,Beijing 100089,China)
出处 《数据分析与知识发现》 CSCD 北大核心 2023年第1期1-21,共21页 Data Analysis and Knowledge Discovery
基金 中央高校基本科研业务费专项资金(项目编号:2022JJ006)的研究成果之一。
关键词 跨语言 多语言 情感分析 双语词嵌入 Cross Lingual Multi-lingual Sentiment Analysis Bilingual Word Embedding
  • 相关文献

参考文献3

二级参考文献15

  • 1WAYNE, C. Topic detection and tracking in English and Chinese [C] . IRAL, Hong Kong, 2000: 165-172.
  • 2BEL N, et. al. Cross-lingual text categorization[ C ] // Proceedings of 7th European Conference on Research and Advanced Technology for Digital Libraries, 2003: 126-139.
  • 3RIGUTINI L, et. al. An EM based training algorithm for crosslanguage text classification [ C ]//Proceedings of the IEEE/ WIC/ACM International Conference on Web Intelligence, 2005.
  • 4GLIOZZO A, STRAPPARAVA C. Cross language text categorization by acquiring multilingual domain models from comparable corpora [ C] //Proceedings of the ACL Wordshop on Building and Using Parallel Texts. 2005 : 9-16.
  • 5OLSSON J S, OARD D, HAJIC J. Cross-language text classification [ C ] // Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005 : 645-646.
  • 6LI Y Y, SHAWE-TAYLOR J. Using KCCA for Japanese-English cross-language information retrieval and document classification [ J] . Journal of intelligent information systems, 2006, 27 (2): 117-133.
  • 7AMINE B M, MIMOUN M. WordNet based multilingual text categorization [C] . 2007 IEEE/ACS International Conference on Computer Systems and Applications, Amman, Jordan, 2007 : 13-16.
  • 8WU Y, OARD W. Bilingual topic aspect classification with a few training examples [ C] //Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008: 203-210.
  • 9WU K, LU Bao-liang. A refinement framework for cross language text categorization [ C ] //Proceedings of 4th Asia Infomation Retrieval Symposium, 2008 : 401-411.
  • 10XIAO L, et al. Can Chinese Web pages be classified with English data source? [ C ] //Proceeding of the 17th International Conference on World Wide Web, 2008: 969-978.

共引文献164

同被引文献36

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部