摘要
针对互信息(MI)特征选择方法存在的正负相关性现象,以及未考虑特征项在不同类别内词频的问题,提出了一种混合互信息特征选择算法(hybrid mutual information,HMI)。引入逆文档频率系数和类间词频信息系数,使得整个文档中的词频信息以及每个类之间的词频信息得以有效利用;引入正负相关性系数,区分正相关性和负相关性并进行有效的利用。通过实验对比表明,混合互信息算法可以有效地提高特征选择的质量,进而提高文本情感分析的效果。
Aiming at the phenomenon of positive and negative correlation in the feature selection method of mutual information(MI)and the problem of the word frequency of the feature items in different categories hadn’t been considered,this paper proposed a hybrid mutual information(HMI)feature selection algorithm.By introducing the inverse document frequency coefficient and the inter-class word frequency information coefficient,the algorithm could effectively utilize the word frequency information in the whole document and the word frequency information between each class.It introduced the positive and negative correlation coefficient to distinguish positive correlation and negative correlation and made effective use.The experimental results show that the hybrid mutual information algorithm can effectively improve the quality of feature selection and then improve the effect of text emotional analysis.
作者
王义
戴月明
Wang Yi;Dai Yueming(School of Internet of Things Engineering,Jiangnan University,Wuxi Jiangsu 214122,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第2期337-341,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61572237).
关键词
互信息
特征选择
正负相关性
词频信息
情感分析
mutual information(MI)
feature selection
positive and negative correlation
word frequency information
sentiment analysis