摘要
【目的】针对网络“水军”发布的虚假评论信息在电商网站泛滥的问题,集成了一种面向中文电商网站评论的融合图像信息与文本语义的虚假评论检测方法(IMTS)。【方法】IMTS方法使用文本卷积神经网络及BERT预训练模型分别对文本评论信息进行特征提取,并得到对应的特征向量。再融入评论者特征,通过拼接评论文本语义与评论者ID的输出特征,进一步加强模型对整体语义信息的捕捉。将用户在评论中发布的图片利用残差网络进行特征抽取,获得对应的视觉特征,最后将文本特征与视觉特征进行多模态融合,检测虚假评论。【结果】IMTS方法在自建的多模态中文虚假评论数据集上,达到0.9636的准确率、0.9635的召回率以及0.9635的F1值。【局限】限于计算能力,本文数据集规模较小,且在文本处理阶段使用了BERT预训练模型,在大规模的数据计算情况下,时间成本较高。【结论】运用多模态思想以及特征融合方法对虚假评论文本进行特征补充从而检测虚假评论是有效的,此方法可以有效提升虚假评论整体的检测精度。
[Objective]This paper proposes a fake comment detection method(IMTS)integrating image information and text semantics for Chinese e-commerce websites,aiming to address the proliferation of fake comments posted by“Internet Water Army”.[Methods]First,we used the text convolutional neural network(TextCNN)and the BERT pre-training model to extract features of the text review information,and obtained the corresponding feature vectors.Then,we integrated the reviewer features to enhance the model’s capture of the overall semantic information by splicing the review text semantics and the output features of the reviewer ID.Third,we used the Residual Network(ResNet)to extract features from pictures posted by users in comments to obtain corresponding visual features.Finally,we conducted multimodal fusion of text features and visual features to detect the fake comments.[Results]The IMTS method achieved 96.36%accuracy,96.35%recall and 96.35%F1 value on the self-built multimodal Chinese fake comment dataset.[Limitations]The dataset in this paper was small in scale,and the BERT pre-training model was used in the text processing stage.[Conclusions]The proposed method could effectively improve the overall detection accuracy of fake comments.
作者
施运梅
袁博
张乐
吕学强
Shi Yunmei;Yuan Bo;Zhang Le;Lv Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;School of Computer Science,Beijing Information Science and Technology University,Beijing 100101,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第8期84-96,共13页
Data Analysis and Knowledge Discovery
基金
国家重点研发计划基金项目(项目编号:2018YFB1004100)
国家自然科学基金项目(项目编号:62171043)的研究成果之一。