期刊文献+

面向产品评论识别的研究 被引量:1

Research on the Identification of Product Reviews
下载PDF
导出
摘要 在垃圾评论问题日益严重的今天,本文主要对产品的评论识别进行研究。在分词技术上,对逆向最大匹配算法进行改进,将中性高频词及无用词先行在句子中剔除,减少循环次数,提高运算效率。重新设置词语权重,在相似度定义中加入平滑因子,从而可以识别近义词。从实验结果可以看出,这种新的识别技术在很大程度上提高了对于产品评论识别的准确率和召回率。 In today's increasingly serious problem of spam product reviews,this paper focuses on the identification of product reviews.In word segmentation technology,the reverse maximum matching algorithm is improved,eliminating neutral high frequency words and useless words first in a sentence and reducing the number of cycles,so as to improve the efficiency of the operation.The word weight is also reset and smoothing factors are added in the definition of similarity,which can identify synonyms.As can be seen from the experimental results,this new identification technology can improve the accuracy and recall rate of product reviews to a large extent.
作者 武雅萱 王悦欣 李洋 王博晨 Wu Yaxuan;Wang Yuexin;Li Yang;Wang Bochen
出处 《科教文汇》 2017年第17期50-52,57,共4页 Journal of Science and Education
关键词 产品的评论识别 分词技术 词语权重 相似度 identification of product reviews word segmentation technology word weight degree of similarity
  • 相关文献

参考文献5

二级参考文献59

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2Niu Yuan.A quantitative study of forum spamming using contextbased analysis[C]//Proeeedings of the 14th Annual Network and Distributed System Security Symposium,San Diego,CA,2007:79-92.
  • 3Mishne G,Carmel D.Blocking blog spam with language model disagreement[C]//Proceedings of the 1st AIRWeb.New York:ACM, 2005 : 1-6.
  • 4Kolari P.Detecting spam blogs:A machine learning approach[C]// Proceedings of the 21st National Conference on Artificial Intelligence.Baltimore : University of Maryland, 2006 : 1351-1356.
  • 5Lin Yu-ru.Splog detection using self-similarity analysis on blog temporal dynamics[C]//Proceedings of AIRWeb 2007.New York: ACM, 2007 : 1-8.
  • 6Brooks C H,Montanez N.Improved annotation of the blogosphere via autotagging and hierarchical clustering[C]//Proceedings of the 15th International Conference on World Wide Web.New York: ACM, 2006 : 625-632.
  • 7Lin C J,Weng R C,Keerthi S S.Trust region newton methods for large-scale logistic regression[C]//Proceedings of the 24th International Conference on Machine Learning.New York:ACM,2007: 561-568.
  • 8蒋涛,张彬.Web Spam技术研究综述[J].情报探索,2007(7):66-68. 被引量:3
  • 9刘震,谭良,周明天.垃圾邮件分类的偏依赖特性研究[J].电子学报,2007,35(10):1870-1874. 被引量:1
  • 10巾国互联网信息中心.第32次中国互联网络发展状况统计报告[R/OL].[2013-09-30].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201307/t20130717_40664.htm.

共引文献71

同被引文献25

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部