摘要
[目的/意义]文章通过设计一种有效的在线食品安全谣言相关文档识别方法,从而提升人工审核的效率,减轻在线食品安全谣言传播带来的不良影响。[方法/过程]基于待分类文档中的词语分布在不同类型的特征向量库(在线食品安全谣言相关库以及非相关库)中的差异性,设计了基于无监督的文档特征相似性计算方法以及基于有监督的回归方法来识别在线食品安全谣言相关文档。[结果/结论]基于无监督的RM-Sort方法能够有效地识别在线食品安全谣言相关文档,并且优于现有的朴素贝叶斯,决策树以及支持向量机方法。进一步地,基于有监督的RM-LR方法效果则更优。[局限]模型只能够判别文档是否和食品谣言相关,但无法确定该文档是辟谣文章还是谣言文章本身。
[Purpose/significance] The methods proposed by this paper can effectively detect online food safety rumor-related documents,and save a lot of review time for the administrators of social platforms to slower or even stop the online food safety rumor propagations. [Method/process] Based on the difference of the distribution of words features at different word vector spaces,including online food safety rumor correlated corpus and uncorrelated corpus,this paper designs an unsupervised documents similarity calculation method,and a supervised regression method to detect online food safety rumor-related documents. [Result/conclusion] The unsupervised RM-Sort method can effectively detect online food safety rumor-related documents,which is better than the methods of Naive Bayesian,Decision Tree,and Support Vector Machines. Moreover,the supervised RM-LR method shows the best results among all baseline methods. [Limitations] The models proposed can only identify whether the document is related to food safety rumor,but can not identify whether the document is a refuting rumor article or rumor article itself.
出处
《情报理论与实践》
CSSCI
北大核心
2018年第6期130-136,142,共8页
Information Studies:Theory & Application
基金
国家自然科学基金项目"医疗健康网站信息可信度与质量控制研究"(项目编号:71473260)
国家社会科学基金项目"健康中国建设中的国民健康促进和健康服务策略研究"(项目编号:16AZD021)
中国人民大学2017年度拔尖创新人才培育资助计划成果
关键词
谣言传播
食品安全
词向量
分布特征
rumor spreading
food safety
word vector
distributional characteristics