摘要
随着文本生成算法的快速发展,生成语句通顺、逻辑性强的新闻已经成为可能。但是人类检测机器生成新闻的能力是有限的,因此本文提出了RoBerta-BiLstm-Attention的检测框架,以实现机器生成新闻的自动检测。首先使用RoBerta的词嵌入表示新闻文本,RoBerta能很好的捕捉新闻的语义信息,提高词嵌入关联上下文的质量。然后将新闻的嵌入表示输入到BiLstm-Attention神经网络中。通过微调GPT2构建的机器生成新闻数据集进行了实验测试。实验表明本文提出的方法在检测解码策略是核采样和序列长度为125的机器生成新闻时,相比于目前最好的方法F1值和准确率都提升了近2%,召回率提升了5.56%。在检测解码策略是topK和序列长度为125的机器生成新闻时,无论是准确率和F1值都比目前最好的方法提高了4%左右。
With the rapid development of text generation algorithms,it is possible to generate news with smooth and logical sentences.Human ability to detect machined-generated news is limited,so this paper proposes a RoBerta-BiLstm-Attention detec⁃tion framework to realize automatic detection of machine-generated news.Firstly,Roberta’s word embedding is used to obtain the news representation.Roberta can capture the semantic information of news and improve the quality of word embedding in the con⁃text.Then the embedded representation of news is input into BiLstm-Attention neural network.Experiments were performed on the machine-generated news data set constructed by fine-tuning GPT2.Experiments show that when the decoding strategy is nucleus sampling and machine-generated news with a sequence length of 125,compared to the state of the art,the F1 score and accuracy rate have increased by nearly 2%,and the recall rate has increased by 5.56%.When the detection and decoding strategy is topK and the machine-generated news with a sequence length of 125,the accuracy and F1-score of proposed method are all increased by about 4%compared with the state of the art.
作者
徐宇
杨频
Xu Yu;Yang Pin(School of Sichuan University,College of Cyber-security,Chengdu 610065)
出处
《现代计算机》
2022年第3期31-35,81,共6页
Modern Computer
关键词
文本生成
机器生成
假新闻
检测框架
text generation
machine generation
fake news
detection framework