摘要
为了防止卖家的恶性竞争、保证电商平台能够公平交易、保护消费者的权益不受侵犯,针对虚假评论检测领域中数据集小、标注不准确等问题,基于亚马逊最新发布的虚假评论数据集对相关算法进行改进。考虑到Word2vec模型无法识别英语中的词对,提出了Bigram-Word2vec模型;提出"二分类加权硬投票法"以解决异质集成学习中分类器投票数相等的情况;针对异质集成学习中分类器权重设置问题提出"加权软投票法"。试验结果表明,文中对相关算法的改进取得了较为理想的结果。
In view of the problem of small data set and inaccurate labeling in the field of fake comment detection, in order to prevent the vicious competition of sellers, ensure the fair trading of e-commerce platform, and protect the rights of consumers, the latest fake comment data set released by Amazon was used. The research was carried out and the related algorithms were improved. The Word2 vec model could not recognize the word pairs in English. The Bigram-Word2 vec model was proposed. The "two-class weighted hard voting" was proposed to solve the heterogeneous integration learning’s case where the number of votes of the classifier was equal. The "weighted soft voting" was studied for how to set the weight of the classifier in heterogeneous integration learning. The experimental results showed that the improvement of related algorithms in this paper had achieved more ideal results.
作者
张大鹏
刘雅军
张伟
沈芬
杨建盛
ZHANG Dapeng;LIU Yajun;ZHANG Wei;SHEN Fen;YANG Jiansheng(School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,Hebei,China;College of Information Engineering,Hebei Institute of Architecture and Civil Engineering,Zhangjiakou 075000,Hebei,China)
出处
《山东大学学报(工学版)》
CAS
CSCD
北大核心
2020年第2期1-9,共9页
Journal of Shandong University(Engineering Science)
基金
张家口市科学技术研究与发展指令计划项目(1711007B,1711045H,1811009B-04)。