期刊文献+

基于微信公众号文章的失真健康信息识别方法比较与优化

Comparison and Improvement of Health Misinformation Identification Methods in WeChat Official Account Articles
下载PDF
导出
摘要 近年来,大量失真健康信息以微信公众号文章的方式在社交平台上广为传播,严重影响了用户对健康知识的获取和利用健康信息做医疗决策的效果。为了抑制失真健康信息的传播,有必要对失真健康信息进行自动化的识别与检测。本文以科普中国、丁香医生等公众号发布的健康类文章和经过辟谣的健康类文章为样本,通过分词、去停用词、语法特征提取和文本分类等步骤对失真健康信息进行识别,并通过分类准确率、精确率、召回率、训练时间等性能指标选出效果最佳的分类器。另外,针对文本分类中“一词多义”和“多词一义”的问题,本文通过LDA(latent Dirichlet allocation)主题分析提取文本的语义特征,进而提出一种“语法+语义”的特征提取方法,经过实验验证,各性能指标比基于语义的特征提取方法以及以往相关模型都有了一定的提升。本文为微信公众号文章中失真健康信息的识别提出了一种新的方法和工具,有利于对失真健康信息开展进一步的监测和治理。 Recently,the proliferation of health misinformation in WeChat official account articles has impacted users’access to health knowledge and decreased their ability to make informed health decisions.To suppress the dissemination of health misinformation,it is necessary to study methods of automatically identifying and detecting health misinformation.This study uses samples from two sources:health articles published by authority accounts(e.g.,“Science China,”“Ding Xiang Doctor,”and other governmental accounts)and articles containing health misinformation that have been labeled.Health misinformation is identified through the steps of word segmentation,stop word removal,syntax feature extraction,and text classification.We selected the best classifier through the comparison of accuracy,precision,recall,training time,and other performance-related indicators.Moreover,to solve the problems of polysemy and synonyms in text classification,this paper used Latent Dirichlet Allocation(LDA)topic analysis to extract the semantic features of the text and then proposed a feature extraction method based on“syntax plus semantics.”The experiments suggest that our proposed new method had better performance over methods based on semantic feature extraction and other prior models.By proposing a novel method for identifying health misinformation in WeChat official account articles,this study may have practical implications for online health misinformation governance.
作者 王雷 宋士杰 朱庆华 Wang Lei;Song Shijie;Zhu Qinghua(School of Information Management,Nanjing University,Nanjing 210023;Business School,Hohai University,Nanjing 211000)
出处 《情报学报》 CSCD 北大核心 2023年第2期127-135,共9页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金项目“社交媒体环境下失真健康信息的传播机制与协同治理研究”(72174083) 中央高校基本科研业务费人文社科专项“基于消费者视角的社交媒体虚假健康信息纠偏干预研究”(B220201054)。
关键词 失真健康信息 语法特征 语义特征 LDA主题分析 算法评价 算法改进 health misinformation syntax features sematic features LDA analysis algorithm evaluation algorithm development
  • 相关文献

参考文献15

二级参考文献226

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部