期刊文献+

基于百度贴吧的HIV高危人群特征分析 被引量:1

Analysis of HIV high-risk population characteristics with Baidu Tieba data
下载PDF
导出
摘要 对百度贴吧"恐艾吧"中在线高危人群的帖子内容、线上活动时间规律进行了分析,利用LDA话题模型,对比分析了有无HIV感染者参与的主贴讨论的话题之间的差异,使用基于关键词的机器学习方法区分了在"恐艾吧"中发布话题的用户的性取向,计算不同性取向人群中HIV的流行率。研究结果说明,使用在线数据挖掘的技术和方法比传统方法更加高效,可以作为高危人群研究的重要补充。此外,基于机器学习对人群性取向进行智能判别,对于公共卫生管理部门监测疫情在不同人群中的发展状况有重要意义。 The textual content and temporal pattern of online activities for users gathered in the "Fear of HIV Bar" of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in "Fear of HIV Bar", and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.
作者 肖时耀 吕慰 陈洒然 秦烁 黄格 蔡梦思 谭跃进 谭旭 吕欣 XIAO Shiyao;LYU Wei;CHEN Saran;QIN Shuo;HUANG Ge;CAI Mengsi;TAN Yuejin;TAN Xu;LU Xin(School of Systems Engineering,National University of Defense Technology,Changsha 410073,China;Department of Oncology,Kangya Hospital,Yiyang 413002,China;School of Software Engineering,Shenzhen Institute of Information Technology,Shenzhen 518172,China)
出处 《大数据》 2019年第1期98-108,共11页 Big Data Research
基金 国家自然科学基金资助项目(No.91846301 No.71771213 No.71790615 No.71690233) 中国教育部文学和社会科学基金资助项目(No.17YJCZH157) 深圳市"鹏城学者计划"基金资助项目~~
关键词 在线高危人群 男同性恋 HIV LDA话题模型 百度贴吧 机器学习 online high-risk populations MSM HIV LDA topic model Baidu Tieba machine learning
  • 相关文献

参考文献3

二级参考文献34

共引文献45

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部