摘要
情感分类任务通常是将有情感倾向的样本分为积极和消极两类。在大多数的理论模型中,这两类样本的数量都被假定是平衡的,而事实上,这两类样本在现实生活中一般是不平衡的。为解决这一问题,提出一种基于Focal损失的Bi-LSTM神经网络模型。首先,采集并标注了24,190条旅游评论作为该模型的数据集,其中积极样本远多于消极样本。为达到更好的分类结果,首先将样本数据集分为核心样本和非核心样本,并剔除非核心样本,提高数据质量;其次,用基于Focal损失的Bi-LSTM神经网络模型对数据进行训练;最后,对测试集进行验证并得到最终分类结果。通过准确率(accuracy)、F1、召回率(recall)和特异度指标(specificity)这四个评价指标判断模型优劣。一系列的实验结果显示,基于Focal损失的Bi-LSTM神经网络模型能够更好的解决样本不平衡的问题,与传统的LSTM模型分类方法相比,其分类性能更好。
In general, the task of sentiment classification usually divides samples with emotional tendencies into two categories: positive and negative. In most theoretical models, the number of samples in these two categories is assumed to be balanced, while in fact, the two categories are generally un-balanced in real life. In this paper, a Bi-LSTM network model based on Focal loss is proposed to clas-sify sentiment for unbalanced sample data. Firstly, 24,190 travel reviews were collected and la-beled as the dataset of the proposed model, whose positive samples were much more than negative samples. In order to achieve better classification results, the sample dataset is first divided into core and non-core samples, and the non-core samples are eliminated to improve the data quality;secondly, the data were trained with a Bi-LSTM neural network model based on Focal loss;finally, the test set is validated and the final classification results are obtained. Four evaluation metrics, accuracy, F1, recall and specificity, are used to judge the model merits. A series of experimental results show that the Bi-LSTM neural network model based on Focal loss can better solve the problem of sample imbalance and has better classification performance compared with the traditional LSTM model classification method.
出处
《计算机科学与应用》
2023年第11期1989-1999,共11页
Computer Science and Application