摘要
针对面向主流价值观的文本质量评价这一全新且较为复杂的任务,本文依据主流价值观对文本质量进行定义,构建了一个面向主流价值观的文本质量评价数据集。为了缓解人工标注数据的压力以及解决域内数据获取困难的问题,提出了一个基于无监督数据增强框架的文本质量评价方法。实验证明,在数据量较小时,能显著提升模型性能。为了获取更多数据,自主构建了一个大规模中文微博检索库,通过检索对数据集进行扩充。最终模型的F1值达到86.2%,相比BERT提升1.22%。
More and more user generated content on the network provides a new window and channel for the publicity of mainstream values.Aiming at the new and complex task of text quality evaluation oriented to mainstream values,this paper defines text quality according to mainstream values,and constructs a text quality evaluation data set oriented to mainstream values.In order to alleviate the pressure of manually labeling data and solve the problem of difficult data acquisition in the domain,this paper proposes a text quality evaluation method based on unsupervised data enhancement framework.Experiments show that the performance of the model can be significantly improved when the amount of data is small.In order to obtain more data,we independently built a large-scale Chinese microblog retrieval database to expand the data set through retrieval.The F1 value of the final model reached 86.2%,which is 1.22%higher than BERT.
作者
崔丁洁
徐冰
CUI Dingjie;XU Bing(Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2023年第5期197-202,F0003,共7页
Intelligent Computer and Applications
基金
国家重点研发计划(2020YFB1406902)。
关键词
文本质量评价
主流价值观
半监督学习
text quality evaluation
mainstream values
semi-supervised learning