期刊文献+

基于表情符号的情感词典的构建研究 被引量:12

Research on Building Sentiment Lexicon Based on Emoticons
下载PDF
导出
摘要 情感词典是文本情感分析的基础资源。利用表情符号明显的情感表达作用,提出一种基于种子表情符和SO-PMI算法结合的情感词典构建方法。选择44个情感明显、内容丰富的表情符号词作为种子情感集合。构建过程融合了TF-IDF值在词汇重要程度的度量作用,有效选择候选情感词集。基于SO-PMI算法,在大量语料中计算候选情感词汇与种子表情符号之间的情感共现信息,进而确定词汇的情感权值和极性。在500万条微博语料中,计算并构建情感词典SentiNet,共有情感词汇13814个,其中正向词汇6885个,负向词汇6929个。将SentiNet应用于微博文本情感分析任务中,实验结果表明,SentiNet能实现情感词的情感表示,并可应用于大规模的微博语料情感分析任务。该方法融合了情感词的重要度衡量优势和种子表情符号集的情感表达优势,证明了获得的情感权值有效。 Sentiment lexicon is the basic resource of text sentiment analysis. By using the advantages of the obvious emotion expression of emoticons,we propose a construction method of sentiment lexicon via seed emoticons and SO-PMI method. First of all,forty-four sentimental emoticons,which possess obvious sentiment and rich content,are choose as a set of seed words. Then,candidate sentimental words among the microblog texts are acquired via the measuring value TF-IDF. Based on the SO-PMI method,the sentimental concurrence information between the candidate sentimental words and the seed emoticons can be calculated in a large set of texts,and then the sentimental weight and polarity of the candidate sentimental words is determined. Subsequently,the sentimental weight of the candidate sentimental words is calculated based on five million microblog texts. And the sentiment lexicon (SentiNet) is built,with a size of 13 814 sentiment words,including 6 885 positive words and 6 929 negative words. Finally,SentiNet is applied into the polarity classification of sentimental text analysis. The experiment shows that SentiNet can represent sentiment of sentimental words and is more adaptable into massive microblog text sentiment analysis. The proposed method combines the importance measure advantage of affective words with the sentimental expression advantage of seed emoticons,and the sentimental weight is effective.
作者 林江豪 顾也力 周咏梅 阳爱民 陈锦 LIN Jiang-hao;GU Ye-li;ZHOU Yong-mei;YANG Ai-min;CHEN Jin(Guangdong University of Foreign Studies,Guangzhou 510006,China;Laboratory for Language Engineering and Computing,Guangdong University of Foreign Studies,Guangzhou 510006,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China)
出处 《计算机技术与发展》 2019年第6期181-185,共5页 Computer Technology and Development
基金 教育部人文社会科学项目(14YJA740011) 广东省哲学社会科学“十二五”规划项目(GD15YTS01) 广东省科技计划项目(2017A04 0406025) 广州市哲学社会科学“十三五”规划2018年度课题(2018GZQN27) 广东外语外贸大学教改项目(GWJY2017046)
关键词 情感词典 情感词 情感权值 种子表情符号 SO-PMI TF-IDF sentiment lexicon sentiment word sentimental weight seed emoticons SO-PMI TF-IDF
  • 相关文献

参考文献10

二级参考文献95

  • 1刘念,马长林,张勇,王梦.基于树核的蛋白质相互作用关系提取的研究[J].华中科技大学学报(自然科学版),2013,41(S2):232-236. 被引量:5
  • 2许静芳,李星,李粤.信息检索中主题式词典的构建方法[J].计算机工程,2005,31(21):143-145. 被引量:5
  • 3朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:327
  • 4KU L-W, LO Y-S, CHEN H-H. Using polarity scores of words for sentence-level opinion extraction [ C]// Proceedings of the 6th NTCIR-6 Workshop Meeting. Toyko, Japan: [ s. n. ], 2007:316 - 322.
  • 5王秉卿,张姝,张奇.中文情感词识别[C]//NCIRCS2008:第四届全国信息检索与内容安全学术会议.北京:[出版社不详],2008:63-69.
  • 6刘群 李素建.基于《知网》的词汇语义相似度的计算.中文计算语言学,2002,17(2):59-76.
  • 7王克,张春良,朱慕华,等.基于情感词词典的中文文本主客观分析[C].NCIRCS2008:第四届全国信息检索与内容安全学术会议.北京,2008.56-62.
  • 8知网[EB/OL].[2009-03-12].http://www.keenage.com.
  • 9TURNEY P D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews [ C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Morristown, N J, USA: Association for Computational Linguistics, 2002:417-424.
  • 10谭松波.中文情感挖掘语料-ChenSentiCorp[EB/OL].(2008-12-19)[2009-03-12].http://www.searchforum.org.cn/tansongbo/corpus-senti.htm.

共引文献472

同被引文献147

引证文献12

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部