摘要
针对智能信息推送管理者的多标签新闻文本分类任务,提出了基于ALBERT-CNN模型的解决方案。利用ALBERT预训练模型和TextCNN卷积神经网络,充分进行语义理解和特征提取。通过ALBERT模型进行语义筛选,精准把握新闻文本内容和主题,再传递给TextCNN模型进行分类和标签预测。采用Sigmoid函数输出每个标签的概率,实现精准的多标签分类。实验验证382688条来自今日头条客户端的数据,ALBERT-CNN模型的F1-Score达到92.05%,召回率达到96.8%,精确率达到90%,相比于优于传统的ALBERT和ALBERT-Denses模型的F1-Score和召回率有所提升。在精确率上略低于AlBERT-Dense。该研究为提高信息推送效率和降低误导性信息的传播提供了一个新的解决方案。
Aiming at the multi-label news text classification task of intelligent information push managers,a solution based on ALBERT-CNN model is proposed.The ALBERT pre-trained model and TextCNN Convolutional Neural Network are employed to comprehensively understand semantics and extract features.Semantic filtering is performed through the ALBERT model to accurately grasp the content and themes of news texts,which are then passed to the TextCNN model for classification and label prediction.The sigmoid function is utilized to output the probability of each label,achieving precise multi-label classification.The experiment verifies 382688 data from the Toutiao client.The F1-Score of ALBERT-CNN model reaches 92.05%,the Recall reaches 96.8%,and the Precision reaches 90%.Compared with the traditional ALBERT and ALBERT-Dense models,it has improved in F1-Score and Recall.It is slightly lower than ALBERT-Dense model in Precision.This study provides a new solution for enhancing information push efficiency and reducing the spread of misleading information.
作者
麦咏欣
林志豪
葸娟霞
MAI Yongxin;LIN Zhihao;XI Juanxia(School of Information Management and Engineering,Neusoft Institute Guangdong,Foshan 528225,China)
出处
《现代信息科技》
2024年第20期31-36,共6页
Modern Information Technology
基金
广东省大学生创新创业训练项目(S202312574015)。