摘要
为了有效解决中文文本分类问题,提高文本分类的准确性,提出一种基于TF-IDF和神经网络相结合的文本自动分类算法——TI-LSTM算法。算法根据语义情景提取相应特征,进行量化,通过长短期神经网络(LSTM)对量化后的特征进行训练并赋予权重,最后以特征权重为依据对中文文本信息进行评价。使用TI-LSTM算法可以在保留原文语义的情况下准确提取特征。将该算法应用到长春理工大学贫困生等级分类研究中。与传统的KNN、逻辑回归、朴素贝叶斯和LSTM分类方法进行了比较,训练和测试的准确率都有了较大的提升,准确率达到了86%以上。
In order to solve the problem of Chinese text classification and improve the accuracy,a text automatic classification algorithm based on TF-IDF and neural network is proposed named by TI-LSTM algorithm in this paper.Firstly,the corresponding features are extracted and quantified in the algorithm according to the semantic situation.Then the quantified features are trained and weighted with the long-short term neural network(LSTM).Finally,Chinese text information is evaluated based on feature weight.This method has been successfully applied to the classification of poverty-stricken students in our school.Compared with traditional KNN,logistic regression,naive Bayes and LSTM classification methods,the accuracy of training and testing has been greatly improved.The automatic text classification algorithm in TI-LSTM algorithm can extract features accurately with the original text semantic,and the average accuracy rate is over 86%.
作者
陈玉天
陈洋
梁恒瑞
孙绍宇
施三支
CHEN Yutian;CHEN Yang;LIANG Hengrui;SUN Shaoyu;SHI Sanzhi(School of Mathematics and Statistics,Changchun University of Science and Technology,Changchun 130022)
出处
《长春理工大学学报(自然科学版)》
2023年第1期130-136,共7页
Journal of Changchun University of Science and Technology(Natural Science Edition)
基金
吉林省教育厅项目(JJKH20210809KJ)
长春理工大学大学生创新创业训练计划项目(2021019)。
关键词
神经网络
文本分类
特征提取
文本量化
贫困生
neural network
text classification
feature extraction
text quantification
poverty-stricken students