摘要
结合时间序列与多尺度特征,提出一种改进的虚假评论识别方法。考虑时间因素对评分及其分布的影响,构建基于多维时间序列的虚假评论识别模型提取异常评论特征,并对异常评论特征进行层次划分,根据多尺度特征思想获取基准尺度特征及细分尺度特征。采用基于密度峰值的聚类算法识别虚假评论,并提高虚假评论识别模型的抗噪能力。实验结果表明,与基于基准尺度特征和多尺度特征的密度峰值聚类虚假评论识别方法相比,该方法的AUC值达到92%,虚假评论识别正确率更高。
This paper proposes an improved fake reviews identification method combining time series with multi-scale features.Considering the influence of time factors on the ratings and its distribution,it constructs fake reviews identification model based on multi-dimensional time series to extract abnormal features.It divides abnormal review features into groups,benchmark features and subdivision scale features are extracted according to multi-scale feature idea.To improve the noise immunity of false reviews identification models,it uses a clustering algorithm based on density peaks to identify fake views.Experimental results show that this method has higher identification correct rate of fake reviews and AUC value reach 92% compared with false comment identification method through density peaks clustering based on benchmark scale feature and multi-scale feature.
作者
狄瑞彤
王红
房有丽
DI Ruitong;WANG Hong;FANG Youli(School of Information Science and Engineering,Shandong Normal University,Jinan 250358,China;College of Life Science,Shandong Normal University,Jinan 250358,China;Shandong Provincial Key Laboratory of Distributed Computer Software Novel Technology,Jinan 250014,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第3期278-285,292,共9页
Computer Engineering
基金
国家自然科学基金(61672329
61373149)
山东省教育科学规划项目(ZK1437B010)
关键词
虚假评论
时间序列
多尺度
主成分分析
聚类
fake review
time series
multi-scale
Principal Component Analysis(PCA)
clustering