摘要
情感词典是文本情感分析的基础资源。利用表情符号明显的情感表达作用,提出一种基于种子表情符和SO-PMI算法结合的情感词典构建方法。选择44个情感明显、内容丰富的表情符号词作为种子情感集合。构建过程融合了TF-IDF值在词汇重要程度的度量作用,有效选择候选情感词集。基于SO-PMI算法,在大量语料中计算候选情感词汇与种子表情符号之间的情感共现信息,进而确定词汇的情感权值和极性。在500万条微博语料中,计算并构建情感词典SentiNet,共有情感词汇13814个,其中正向词汇6885个,负向词汇6929个。将SentiNet应用于微博文本情感分析任务中,实验结果表明,SentiNet能实现情感词的情感表示,并可应用于大规模的微博语料情感分析任务。该方法融合了情感词的重要度衡量优势和种子表情符号集的情感表达优势,证明了获得的情感权值有效。
Sentiment lexicon is the basic resource of text sentiment analysis. By using the advantages of the obvious emotion expression of emoticons,we propose a construction method of sentiment lexicon via seed emoticons and SO-PMI method. First of all,forty-four sentimental emoticons,which possess obvious sentiment and rich content,are choose as a set of seed words. Then,candidate sentimental words among the microblog texts are acquired via the measuring value TF-IDF. Based on the SO-PMI method,the sentimental concurrence information between the candidate sentimental words and the seed emoticons can be calculated in a large set of texts,and then the sentimental weight and polarity of the candidate sentimental words is determined. Subsequently,the sentimental weight of the candidate sentimental words is calculated based on five million microblog texts. And the sentiment lexicon (SentiNet) is built,with a size of 13 814 sentiment words,including 6 885 positive words and 6 929 negative words. Finally,SentiNet is applied into the polarity classification of sentimental text analysis. The experiment shows that SentiNet can represent sentiment of sentimental words and is more adaptable into massive microblog text sentiment analysis. The proposed method combines the importance measure advantage of affective words with the sentimental expression advantage of seed emoticons,and the sentimental weight is effective.
作者
林江豪
顾也力
周咏梅
阳爱民
陈锦
LIN Jiang-hao;GU Ye-li;ZHOU Yong-mei;YANG Ai-min;CHEN Jin(Guangdong University of Foreign Studies,Guangzhou 510006,China;Laboratory for Language Engineering and Computing,Guangdong University of Foreign Studies,Guangzhou 510006,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China)
出处
《计算机技术与发展》
2019年第6期181-185,共5页
Computer Technology and Development
基金
教育部人文社会科学项目(14YJA740011)
广东省哲学社会科学“十二五”规划项目(GD15YTS01)
广东省科技计划项目(2017A04 0406025)
广州市哲学社会科学“十三五”规划2018年度课题(2018GZQN27)
广东外语外贸大学教改项目(GWJY2017046)