摘要
【目的】为了维护电子商务市场的秩序,开发有效的虚假评论识别技术显得尤为重要。本文旨在解决虚假评论识别中的数据不平衡问题和模型学习过程中的灾难性遗忘问题。【方法】本文提出一种基于增量学习的虚假评论识别方法,以解决数据不平衡问题,并引入弹性权重整合技术,用于缓解模型在学习过程中可能出现的灾难性遗忘问题。【结果】在YelpCHI、YelpNYC和YelpZIP数据集上进行实验,对比现有先进方法En-HGAN,本文模型在在三个数据集上的F1值分别提升了17.2、16.1和13.3个百分点,AUC值提分别提升了12.8、13.8和13.6个百分点。【局限】在处理极端不平衡数据集时仍有改进空间,增量学习带来的灾难性遗忘仍然存在。【结论】本文方法能够有效识别虚假评论,为电子商务市场的诚信建设提供技术支持。
[Objective]In order to maintain the order of the e-commerce market,it is particularly important to develop effective fake review identification technology.This paper aims to solve the data imbalance problem in fake review identification and the catastrophic forgetting problem in the model learning process.[Methods]This paper proposes a fake review identification method based on incremental learning to solve the data imbalance problem,and introduces elastic weight consolidation technology to alleviate the catastrophic forgetting problem that may occur in the model during the learning process.[Results]Experiments were conducted on the YelpCHI,YelpNYC,and YelpZIP datasets.Compared with the existing advanced method(En-HGAN),the F1 scores of our model on the three datasets increased by 17.2%,16.1%,and 13.3%,respectively,and the AUC scores increased by 12.8%,13.8%,and 13.6%,respectively.[Limitations]The current method still has room for improvement when dealing with extremely imbalanced data sets.The catastrophic forgetting caused by incremental learning still exists.[Conclusions]The experimental results show that the proposed method is effective in identifying false reviews and can provide technical support for the integrity construction of the e-commerce market.
作者
刘美玲
甘娇娇
曾莹
王双双
周继云
Liu Meiling;Gan Jiaojiao;Zeng Ying;Wang Shuangshuang;Zhou Jiyun(School of Information and Computer Engineering,Northeast Forestry University,Harbin 150006,China;Lieber Institute,Johns Hopkins University,Baltimore,MD21218,USA)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第8期85-95,共11页
Data Analysis and Knowledge Discovery
基金
黑龙江省自然科学基金项目(项目编号:LH2022F002)
国家自然科学基金青年科学基金项目(项目编号:61702091)的研究成果之一
关键词
虚假评论识别
不平衡数据
增量学习
弹性权重整合
Fake Review Identification
Imbalanced Data
Incremental Learning
Elastic Weight Consolidation