摘要
目的基于主题建模的文本分析思路,开展传染病相关科学主题的内容挖掘和呈现,尝试从文本建模视角呈现我国传染病科学主题的演变趋势。方法收集中国知识资源总库(CNKI)中传染病相关论文,使用文档语义结构挖掘工具集gensim对其标题与摘要进行文本预处理与特征转化,呈现和绘制传染病相关科学主题的演进模式,基于机器学习工具scikit-learn的LDA(Latent Dirichlet Allocation)和主题邻近度计算的文本挖掘方法。结果对我国近30年间发表的2 594篇以传染病为主题的期刊文献进行分析,我国传染病相关研究可划分为"传染病特性研究"、"传染病防控防疫"、"数理分析与预测研究"3大主题。主要研究方法为问卷调查法、传染病模型、假设检验、动力学模型等。结论新发传染病的暴发流行极大地促进了传染病学主题研究文献的发表。近年来与数学、计算机、经管、情报领域的关联越发紧密,传染病学研究呈现出与多学科多方法交叉融合的发展态势。
Objective Based on the text analysis of topic modeling,this paper carried out the content mining and presented the evolution trend of infectious disease science topics in China from the perspective of text modeling. Methods Papers related to infectious diseases from CNKI were collected. The document semantic structure mining tool gensim was used to preprocess and transform the text features and present the evolution mode of scientific topics. A text mining method based on LDA(Latent Dirichlet Allocation) of scikit-learn(machine learning tool) and topic proximity calculation was performed to presenting and mapping the patterns of scientific topics related to infectious diseases. Results Based on the analysis of 2 594 articles on infectious diseases published in China in recent 30 years,the research on infectious diseases in China can be divided into three major themes:"research on characteristics of infectious diseases","prevention and control of infectious diseases" and "mathematical analysis and prediction". The main research methods include questionnaire survey, infectious disease model, hypothesis testing, dynamic model and so on. Conclusion The outbreak of new infectious diseases has greatly promoted the publication of research literature on infectious diseases. In recent years,infectious disease become more and more closely related to the fields of mathematics, computer, economics and management, and information.The research of infectious diseases shows a trend of cross integration with multi-disciplinary and multi-methodology.
作者
晁筱雯
周京生
李育平
刘雨婷
李疏影
陈麒
卢光玉
CHAO Xiaowen;ZHOU Jingsheng;LI Yuping;LIU Yuting;LI Shuying;CHEN Qi;LU Guangyu(Medical College of Yangzhou University,Yangzhou City 225007,Jiangsu Province,China.;Mobile Payment Department of China UnionPay Co.Ltd,Shanghai 201201,China.;Department of Neurosurgery,SuBei People′s Hospital,Clinical Medicine College of Yangzhou University,Yangzhou 225009,Jiangsu Province,China.;School of Nursing,Yangzhou University,Yangzhou City 225007,Jiangsu Province,China.;Institute of Global Health,Heidelberg University,Heidelberg 69117,Germany.;Jiangsu Key Laboratory of Integrated Traditional Chinese and Western Medicine for the Prevention and Treatment of Geriatric Diseases,Yangzhou 225007,Jiangsu Province,China.;Institute of Public Health and Preventive Medicine,Medical College of Yangzhou University,Yangzhou 225007,Jiangsu Province,China.)
出处
《预防医学情报杂志》
CAS
2021年第6期865-871,共7页
Journal of Preventive Medicine Information
基金
国家自然科学基金(项目编号:71904165)
江苏省博士后科学基金资助项目(项目编号:2020Z003)。
关键词
传染病
文本挖掘
主题模型
研究主题
方法演进
infectious diseases
text mining
topic model
research topic
method evolution