摘要
【目的】对产品虚假评论文本识别方法研究现状进行述评。【文献范围】以“Review Spam”、“虚假评论”等为主题词在WoS、CNKI、EI等8个数据库中进行文献检索,经过文献主题筛选、质量评估和参考文献追溯等步骤获得代表性文献90篇。【方法】采用系统性文献综述过程对虚假评论文本识别方法研究的关键内容进行提取、归纳和分类,总结并对比分析各类虚假特征的表征力和识别方法性能。【结果】虚假特征设计和识别方法设计是虚假评论文本识别的关键步骤,大规模标注评论数据的获取是当前研究的难点。【局限】仅以虚假评论文本识别方法作为探讨核心,未探讨虚假评论者及虚假评论者群体识别方法。【结论】分析并指出现有研究在数据集获取、虚假特征设计和识别方法设计三个方面存在的问题,并对虚假评论文本识别未来研究提出建议。
[Objective]This paper reviews current studies on fighting product review spam.[Coverage]We searched“review spam”with eight major scholarly databases(e.g.,WoS,CNKI and EI,etc.),and retrieved a total of 90 relevant papers.[Methods]First,we adopted systematic review procedure to identify and categorize the methods detecting product review spam.Then,we compared the impacts of spam features on detection performance.[Results]The spam features and detection methods were the key issues in fighting product review spam.The acquisition of large-scale annotation data was a challenging task for current research.[Limitations]We did not examine the detection and classification methods for spammers.[Conclusions]This paper analyzes spam detection methods from the perspectives of data acquisition,spamming features and detection methods.It offers suggestions and directions for future research.
作者
吴佳芬
马费成
Wu Jiafen;Ma Feicheng(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2019年第9期1-15,共15页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金重点国际合作项目“大数据环境下的知识组织与服务创新研究”(项目编号:71420107026)的研究成果之一