摘要
在人工智能和大数据时代,通过分析数据来发现规律已经成为一种趋势。作为当前较为普遍且流行的数据信息,文本数据已经广泛地被用于各种分析。在文本数据分析领域,文本语句的褒贬含义研究是目前的一个重要研究方向。通过对一些主观性言论情感色彩的倾向性分析,这种研究有助于了解公众对某些热点事件的看法和意见。本文通过基于信息熵的文本挖掘理论,结合褒贬词的比例,提出一种文本语句整体褒贬判断的方法—比例信息熵。实验结果表明,相比于目前所流行的几种典型的信息熵方法,比例信息熵对文本整体信息的褒贬倾向性分析是有效的。
In the era of artificial intelligence and big data,to find implicit laws from data analysis has been a new trend.Currently,as a kind of common and popular information,text data have been widely applied in various analyses.Studying the commendatory and derogatory meaning of text sentences is an important way in the field of text data analysis.By analyzing the emotion color tendency of some subjective speech,the research contributes to illustrate how the public to feel about some hot issues.In this paper,we combine the proportion of commendatory and derogatory words to develop a novel methodology to analyze the positive or negative trends of the whole sentences.Experiments reveal that this method performs better than some selected typical entropy methodologies.Therefore,the proportion information entropy can be regarded to be helpful for positive or negative judgments of sentences.
作者
张冠东
杨琛
詹晓琳
方红
王继芬
ZHANG Guandong;YANG Cheng;ZHAN Xiaolin;FANG Hong;WANG Jifen(College of Arts and Sciences, Shanghai Polytechnic University, Shanghai 201209, China;School of Economics and Management, Wuhan University, Wuhan 430072, China)
出处
《微型电脑应用》
2021年第11期12-15,共4页
Microcomputer Applications
基金
上海市自然科学基金面上项目(20ZR1455900)。
关键词
信息熵
褒义和贬义
比例
information entropy
commendatory and derogatory
proportion