期刊文献+

基于Adaboost算法与规则匹配的垃圾评论识别 被引量:2

Spam Review Identification Based on Adaboost Algorithm and Rules Matching
下载PDF
导出
摘要 从评论的文本特征及元数据特征两个角度提取特征,避免特征向量过于稀疏.提出了基于随机森林的Adaboost算法,以减弱商品评论数据集不平衡性的影响.部分垃圾评论特征比较显著,采用规则匹配进一步提高垃圾评论识别的召回率.通过在COAE2015任务4提供的数据集上进行实验,取得较好的识别效果,验证了所提方法的有效性. Features were extracted from both the text content and meta data of reviews to avoid feature vectors being sparse. Adaboost based on random forest was proposed to reduce the influence of unbal- anced product review data set. Because of the very obvious characteristics of some spare reviews, rule matching was applied to further improve the recall rate. The experimental results on the data set provided by COAE2015 task 4 showed that the proposed method was effective.
出处 《郑州大学学报(理学版)》 CAS 北大核心 2017年第1期24-28,共5页 Journal of Zhengzhou University:Natural Science Edition
基金 国家自然科学基金项目(61402419) 国家社会科学基金项目(14BYY096) 国家重点基础研究发展项目(973计划)(2014CB340504) 河南科技厅基础研究项目(142300410231 142300410308) 河南省高等学校重点科研项目(15A520098)
关键词 垃圾评论识别 随机森林 ADABOOST 集成学习算法 identification of spam reviews random forest Adaboost ensemble learning algorithm
  • 相关文献

参考文献5

二级参考文献55

  • 1郭红刚,方敏.AdaBoost方法在入侵检测技术上的应用[J].计算机应用,2005,25(1):144-146. 被引量:6
  • 2http://www.cs.waikato.ac.nz/ml/weka/.
  • 3KOLARI P, JAVA A, FININ T, et al. Detecting spare blogs: a ma- chine learning approach [C]// AAAI '06: Proceedings of the 21st National Conference on Artificial Intelligence. [ S. I. ] : AAAI Press, 2006, 2:1351 - 1356.
  • 4NTOULAS A, NAJORK M, MANASSE M, et al. Detecting spare Web pages through content analysis [ C]// WWW '06: Proceedings of the 15th International Conference on World Wide Web. New York: ACM, 2006:83-92.
  • 5BHATTARAI A, RUS V, DASGUPTA D. Characterizing comment spare in the blogosphere through content analysis [ C]// CICS '09: Proceedings of IEEE Symposium on Computational Intelligence in Cyber Security. Piscataway: IEEE, 2009:37-44.
  • 6FREUND Y, SCHAPIRE R, ABE N. A short introduction to boos- ting . Journal of Japanese Society for Artificial Intelhgence, 1999, 14(5): 771-780.
  • 7BOYARSHINOV V, MAGDON-ISMAIL M. Efficient optimal line- ar boosting of a pair of classifiers[ J]. IEEE Transactions on Neural Networks, 2007, 18(2) : 317 -328.
  • 8严云洋,郭志波,杨静宇.基于双阈值的增强型AdaBoost快速算法[J].计算机工程,2007,33(21):172-174. 被引量:9
  • 92012年中国网络购物市场研究报告[R/OL].[2013-11-16].http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201304/t2013041739290.htm.
  • 10淘宝评价体系介绍[EB/OL].[2013-10-18].http://service.taobao.corn/support/knowledge一4781666.htm?spm.0.0.0.49.x2xxVE&dkey=searchview.

共引文献92

同被引文献31

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部