期刊文献+

基于类不平衡和特征选择的两阶段垃圾评论检测方法 被引量:1

Two-Stage Spam Review Detection Method Based on Class Imbalance and Feature Selection
下载PDF
导出
摘要 用户在电商平台购买商品的时候,其他用户对相关商品的评论起着重要引导作用。出于影响用户购买倾向等目的,部分商家在电商平台存在恶意刷评论的行为。已有的垃圾评论识别研究重点从用户的购买行为等方面进行数据挖掘,目前还没有研究人员从中文电商平台的垃圾评论内容角度展开研究。从国内某一知名电商平台抓取相关数据,根据行为模式确定强疑似垃圾评论;针对搜集的数据集内存在的类不平衡问题和维度灾难问题,设计出了一种两阶段垃圾评论检测方法。实证研究表明,该方法构建的模型相对于仅考虑类不平衡或仅考虑维数灾难的基准方法,具有更好的分类效果。 Relevant commodity reviews play an important role in users’purchase through e-commerce platforms;therefore,some merchants have maliciously spammed reviews on e-commerce platforms for the purpose of influencing the users’purchase predisposition.The existing spam recognition research focuses on data mining from users’purchase behaviors,leaving the room for the research of spam reviews on Chinese e-commerce platforms.This paper captures relevant data from a well-known domestic e-commerce platform to determine the strong suspected spam reviews based on users’behavior mode,and then develops a two-stage spam detection method in view of the class imbalance and the dimension disaster within data sets collected.The empirical study indicates that the model constructed by means of the proposed method shows a better effect in spam review detection compared with the fiducial approach considering only the class imbalance or the dimension disaster.
作者 曲豫宾 李芳 陈翔 QU Yubin;LI Fang;CHEN Xiang(Jiangsu College of Engineering and Technology,Nantong226007,China;Nantong University,Nantong226019,China)
出处 《江苏工程职业技术学院学报》 2017年第4期16-20,共5页 Journal of Jiangsu College of Engineering and Technology
基金 南通市分布式发电与微电网技术重点实验室项目(编号CP12015007) 江苏工程职业技术学院科研项目(编号GYKY/2016/15)
关键词 垃圾评论检测 类不平衡学习 特征选择 实证研究 spam review detection class imbalanced learning feature selection empirical research
  • 相关文献

参考文献2

二级参考文献48

  • 1中国互联网协会.中国互联网协会反垃圾邮件规范[EB/OLl.2003-02-26.http://www.isc.org.cn/20020417/cal34119.htm.
  • 2Becchetti L, Castillo C, Donato D, et al. Link analysis for Web spare detection [J]. ACM Trans Web, 2008,2 (1) : 1-42.
  • 3Cortezp P,Correia A, Sousa P, et al. Spam email filtering using network-level properties I-C]//Proceedings of the 10th industrial conference on Advances in data mining:applications and theoretical aspects. Berlin, Germany.. Springer-Verlag, 2010 :476-489.
  • 4Ghose A, Ipeirotis P G. Designing novel review ranking systems: predicting the usefulness and impact of reviews [C]//Proceedings of the ninth international conference on electronic commere. Minneapolis, MN, USA: ACM, 2007 : 303-310.
  • 5Liu J, Cao Y, Lin C-Y, et al. Low-Quality Product Review Detection in Opinion Summarization [C] // Proceedings of the Joint Conference on Empirical Methods in Natural Language and Computational Natural Language Learning. Prague, 2007:334-342.
  • 6Kim S-M, Pantel P, Chklovskit T, et al. Automatically assessing review helpfulness [C]//Proeeedings of the 2006 Conference on Empirieal Methods in Natural Language Processing. Sydney, Australia; Association for Computational Linguistics, 2006 : 423- 430.
  • 7Zhang Z, Varadaraj an B. Utility scoring of product reviews [ C ]// Proceedings of the 15th ACM international conference on information and knowledge management. Arlington, Virginia, USA:ACM, 2006 :51-57.
  • 8Pang B, Lee L. Opinion Mining and Sentiment Analysis [J].Found Trends InfRetr,2008,2(1/2):1-135.
  • 9Stoyanov V, Cardie C. Topic identification for fine-grained opinion analysis [C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Manchester, United Kingdom: Association for Computational Linguistics, 2008.. 817-824.
  • 10Titov I, Mcdonald R. Modeling online reviews with multi-grain topic models [C]//Proeeeding of the 17th international conferenee on World Wide Web. Beijing, China: ACM, 2008:111-120.

共引文献20

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部