摘要
近年来,以微博为代表的中文媒体平台正在不断融入人们的生活,人们每天都在这些平台上发表自己的观点、感受等其他主观信息,如何从这些信息中提取有价值的情感信息并加以利用就称作情感分析。本文提出了一种基于情感词典与语义规则集的微博文本情感分析方法。我们的方法将现有的多个基础情感词典结合起来,并基于统计信息的方法构建了微博领域情感词典,同时考虑到中文的语义特性,加入了自定义的语义规则集。为了验证该方法的有效性,我们通过网络爬虫技术获取微博中关于新冠肺炎的评论信息共10万条微博文本,在此数据集上进行了实验。实验结果表明,与传统的基于情感词典的方法相比,我们的方法具有更高的准确性和更稳定的表现,正面、负面和中性情感识别准确率分别达到了79.4%、82.5%、77.3%。综上所述,本文提出的基于情感词典与语义规则集的微博文本情感分析方法具有较高的准确性和泛化能力,能够有效地识别微博文本中的情感,并具有应用价值。
In recent years, Chinese media platforms represented by microblog have been increasingly integrated into people’s lives. People express their views, feelings and other subjective information on these platforms every day. How to extract valuable sentimental information from this information and make use of it is called sentiment analysis. In this paper, a sentiment analysis method based on sentiment dictionary and semantic rule set is proposed. Our method combines several existing basic sentiment dictionaries and constructs a microblog domain sentiment dictionary based on statistical information. At the same time, considering the semantic characteristics of Chinese language, we add a custom semantic rule set. In order to verify the effectiveness of this method, we used web crawler technology to obtain a total of 100,000 microblog comments on COVID-19, and conducted experiments on this data set. The experimental results show that compared with the traditional sentiment dictionary-based method, our method has higher accuracy and more stable performance, and the accuracy rate of positive, negative and neutral sentiment recognition reaches 79.4%, 82.5% and 77.3%, respectively. In conclusion, the sentiment analysis method based on sentiment diction-ary and semantic rule set proposed in this paper has high accuracy and generalization ability, and can effectively identify the sentiment in microblog, and has application value.
出处
《计算机科学与应用》
2023年第4期754-763,共10页
Computer Science and Application