摘要
从评论的文本特征及元数据特征两个角度提取特征,避免特征向量过于稀疏.提出了基于随机森林的Adaboost算法,以减弱商品评论数据集不平衡性的影响.部分垃圾评论特征比较显著,采用规则匹配进一步提高垃圾评论识别的召回率.通过在COAE2015任务4提供的数据集上进行实验,取得较好的识别效果,验证了所提方法的有效性.
Features were extracted from both the text content and meta data of reviews to avoid feature vectors being sparse. Adaboost based on random forest was proposed to reduce the influence of unbal- anced product review data set. Because of the very obvious characteristics of some spare reviews, rule matching was applied to further improve the recall rate. The experimental results on the data set provided by COAE2015 task 4 showed that the proposed method was effective.
出处
《郑州大学学报(理学版)》
CAS
北大核心
2017年第1期24-28,共5页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学基金项目(61402419)
国家社会科学基金项目(14BYY096)
国家重点基础研究发展项目(973计划)(2014CB340504)
河南科技厅基础研究项目(142300410231
142300410308)
河南省高等学校重点科研项目(15A520098)