期刊文献+

藏文情感词典构建的现状分析

Status Analysis of Construction of Tibetan Emotional Dictionary
下载PDF
导出
摘要 近年来,许多研究者证实,基于深度学习的多特征融合情感分析方法比纯深度学习方法更能挖掘文本的情感信息,其中情感词特征是最重要的特征之一。目前,藏文虽然有少量的情感词典,但基本上没有公开,想要使用藏文情感词典资源,只能自行构建。研究藏文情感词典的构建现状能对后续藏文情感词典的构建提供帮助。为了解藏文情感词典的词汇分类方法、常用词典构建方法以及已有藏文情感词典的词汇量与词汇构成等方面的研究现状,文中通过对比和统计等方法分析了近10年藏文情感词典构建相关的文献(以CHKI为主),总结出了藏文情感词典构建方面的研究状况。经研究发现,情感词的分类方法中,主要有7大类21小类、12大类20小类、2大类18小类等。藏文情感词典的构建方法包括词典匹配、机器翻译、SO-PMI扩充、基于word2vec或BERT的相似度扩充方法等。已有藏文情感词典的词汇量大致在5000至28000之间,接近中文情感词典的水平,词汇构成主要包含情感词、程度副词、否定词、双重否定词、表情词等。希望为相关研究人员提供参考。 In recent years,many researchers have confirmed that deep learning based multi feature fusion sentiment analysis methods are more capable of mining emotional information in texts than pure deep learning methods,with emotional word features being one of the most important features.At present,although there are a small number of emotional lexicon in Tibetan,they are basically not publicly available.If you want to use Tibetan emotional lexicon resources,you can only build them yourself.Studying the current construction status of Tibetan emotion lexicon can provide assistance for the subsequent construction of Tibetan emotion lexicon.In order to understand the vocabulary classification methods,commonly used lexicon construction methods,and the current research status of the vocabulary and composition of existing Tibetan emotional lexicon,we analyze the literature related to the construction of Tibetan emotional lexicon in the past 10 years(mainly CHKI)through comparative and statistical methods,and summarize the research status of the construction of Tibetan emotional lexicon.Through research,it has been found that the classification methods for emotional words mainly include 7 categories and 21 subcategories,12 categories and 20 subcategories,2 categories and 18 subcategories.The construction methods of Tibetan emotional lexicon include lexicon matching,machine translation,SO-PMI expansion,similarity expansion based on word2vec or BERT,etc.The vocabulary of existing Tibetan emotional lexicon is roughly distributed between 5000 and 28000,close to the level of Chinese emotional lexicon.The vocabulary composition mainly includes emotional words,degree adverbs,negative words,double negative words,emoticons,etc.We hope to provide reference for researchers and those who are building Tibetan emotional lexicon.
作者 才让东知 杨杰 尼玛扎西 TSHERING Dondrub;YANG Jie;NYIMA Trashi(Engineering Research Center of Tibetan Information Technology of Ministry of Education,Lhasa 850000,China;School of Information Science and Technology,Tibet University,Lhasa 850000,China;Collaborative Innovational Center for Tibet informatization,Lhasa 850000,China)
出处 《计算机技术与发展》 2024年第3期9-14,共6页 Computer Technology and Development
基金 国家科技创新2030——“新一代人工智能”重大项目(2022ZD0116101) 西藏大学2021级研究生‘高水平人才培养计划’项目(2021-GSP-S129)。
关键词 藏文情感词典 情感词分类 词典构建方法 词汇量 词汇构成 Tibetan emotional lexicon emotional word classification lexicon construction method vocabulary vocabulary composition
  • 相关文献

参考文献13

二级参考文献100

共引文献215

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部