摘要
作者身份识别任务旨在判断一篇文档的作者,但目前已有的作者身份识别方法都是目标独立的,意味着这些方法在预测作者身份时假设没有任何限定条件,这与实际情况不相符合。为了解决限定条件下的作者身份识别问题,提出了一种目标依赖的作者身份识别方法TDAA。首先,使用用户评论对应的商品ID作为限定信息;其次,为了使文本建模过程更加具有普适性,使用BERT提取预训练的评论文本特征;然后,使用卷积神经网络(CNN)进行深层次的文本特征提取;最后,为了将两种不同的信息融合起来,讨论了两种不同的融合方式。在亚马逊电影评论(Amazon MovieandTV)和CD评论(CDsandVinyl5)两个数据集上的实验结果表明,所提出的方法在精确率评价指标上较对比方法提高了4%~5%。
Authorship attribution is the task of deciding who is the author of a particular document,however,the traditional methods for authorship attribution are target-independent without considering any constraint during the prediction of authorship,which is inconsistent with the actual problems.To address the above issue,a Target-Dependent method for Authorship Attribution(TDAA)was proposed.Firstly,the product ID corresponding to the user review was chosen to be the constraint information.Secondly,Bidirectional Encoder Representation from Transformer(BERT)was used to extract the pre-trained review text feature to make the text modeling process more universal.Thirdly,the Convolutional Neural Network(CNN)was used to extract the deep features of the text.Finally,two fusion methods were proposed to fuse the two different information.Experimental results on Amazon MovieandTV dataset and CDsandVinyl5 dataset show that the proposed method can increase the accuracy by 4%-5%compared with the comparison methods.
作者
李扬
张伟
彭晨
LI Yang;ZHANG Wei;PENG Chen(School of Computer Science and Technology,East China Normal University,Shanghai 200062,China;Institute of Electronics,Chinese Academy of Sciences,Suzhou Jiangsu 215123,China)
出处
《计算机应用》
CSCD
北大核心
2020年第2期473-478,共6页
journal of Computer Applications
基金
国家自然科学基金青年基金资助项目(61702190)~~
关键词
作者身份识别
目标依赖
卷积神经网络
信息融合
预训练语言模型
authorship attribution
target-dependent
Convolutional Neural Network(CNN)
information fusion
pretrained language model