摘要
在垃圾评论问题日益严重的今天,本文主要对产品的评论识别进行研究。在分词技术上,对逆向最大匹配算法进行改进,将中性高频词及无用词先行在句子中剔除,减少循环次数,提高运算效率。重新设置词语权重,在相似度定义中加入平滑因子,从而可以识别近义词。从实验结果可以看出,这种新的识别技术在很大程度上提高了对于产品评论识别的准确率和召回率。
In today's increasingly serious problem of spam product reviews,this paper focuses on the identification of product reviews.In word segmentation technology,the reverse maximum matching algorithm is improved,eliminating neutral high frequency words and useless words first in a sentence and reducing the number of cycles,so as to improve the efficiency of the operation.The word weight is also reset and smoothing factors are added in the definition of similarity,which can identify synonyms.As can be seen from the experimental results,this new identification technology can improve the accuracy and recall rate of product reviews to a large extent.
作者
武雅萱
王悦欣
李洋
王博晨
Wu Yaxuan;Wang Yuexin;Li Yang;Wang Bochen
出处
《科教文汇》
2017年第17期50-52,57,共4页
Journal of Science and Education
关键词
产品的评论识别
分词技术
词语权重
相似度
identification of product reviews
word segmentation technology
word weight
degree of similarity