摘要
对跨领域情感分类任务中因标签样本不足以及不同领域中特征分布差异大导致分类准确率低的问题进行研究,提出一种改进特征选择的跨领域情感分类模型(IPFS)。利用词形还原解决文本中构建词袋模型中的特征冗余的问题,通过卡方检验算法选择领域间具有相同表征的枢轴特征作为领域间共享的桥梁,结合神经网络模型,完成跨领域情感分类任务。实验结果表明,IPFS模型与现有的相关模型相比取得了更好的分类效果。
Aiming at the problems that the cross-domain classification has low accuracy caused by insufficient label samples in affective analysis and large difference of feature distribution in different domains,a cross domain sentiment classification model that improved pivot feature selection(IPFS)was proposed.The lemmatization was used to solve the feature redundancy problem when the construction of the word bag model was needed in the text.The pivot features with same representation between the domains were selected as the shared bridge using chi-square algorithm,and the neural network was used to reduce the inter domain differences between different domains.In combination with the neural network model,the cross domain sentiment classification task was completed.Experimental results show that compared with existing related models,IPFS has better classification performances.
作者
梁俊葛
相艳
张周彬
熊馨
邵党国
马磊
LIANG Jun-ge;XIANG Yan;ZHANG Zhou-bin;XIONG Xin;SHAO Dang-guo;MA Lei(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China)
出处
《计算机工程与设计》
北大核心
2020年第11期3193-3198,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(61462054、61732005、61672271、61741112)
云南省自然科学基金项目(2017FB098)
国家博士后面上科学基金项目(2016M592894XB)
云南省科技厅基金项目(2015FB135)
云南省重大科技基金项目(2018ZF017)。
关键词
跨领域情感分析
枢轴特征
卡方检验
词形还原
神经网络
cross domain sentiment analysis
pivot feature
chi-square test
lemmatization
neural network