摘要
【目的】阐述贝叶斯概率链接模型的原理和方法,并应用于出生和死亡数据的链接以展示模型的应用效果。【方法】通过上海市出生和死亡登记系统,收集2017年出生婴儿199025例,2017和2018年死亡婴儿1512例,对清洗后数据按月份分区后进行全链接,以Jaro-Winkler算法和欧式距测量两个数据集用于匹配字段的相似度,以之构建贝叶斯概率链接模型,并用混淆矩阵评估链接效果。【结果】应用贝叶斯概率链接模型,将婴儿出生和死亡数据进行了有效链接,发现上海市死亡婴儿中36.71%生于外地,测算得到婴儿死亡概率为2.60‰。测试集混淆矩阵显示,模型的召回率为0.86,精确率为0.76,F-score为0.81。【结论】贝叶斯概率链接的实例应用显示模型效果良好,用于建立出生死亡队列,能更准确地反映婴儿死亡的真实水平。利用该技术,整合不同部门数据,可有效提升公共卫生领域的研究效率。
[Objective]To elucidate the principles and methods of the Bayesian probabilistic linkage model,and to demonstrate the effect of applying the model in linking birth and death data.[Methods]Through the Shanghai birth and death registration system,data of 199025 infants born in 2017 and 1512 infants who died in 2017 and 2018 were collected.After cleaning the data,the data were divided into monthly blocks and fully linked.The Jaro-Winkler algorithm and Euclidean distance were employed to measure the similarity of fields for matching.A Bayesian probabilistic linkage model was constructed and the linking effect was evaluated using a confusion matrix.[Results]Using the Bayesian probabilistic linkage model,the birth and death data of infants were effectively linked,revealing that 36.71%of infants who died in Shanghai were born outside the city,and the probability of infant death was 2.6‰.The confusion matrix of the test set showed a recall rate of 0.86,precision of 0.76,and an F-score of 0.81.[Conclusion]The practical application of Bayesian probabilistic linkage demonstrates a good model performance,enabling the establishment of birth-death cohorts that more accurately reflect the true levels of infant mortality.Utilizing this technique to integrate data from different departments can effectively improve research efficiency in the field of public health.
作者
虞慧婷
蔡任之
林维晓
倪静怡
钱耐思
夏天
吴凡
YU Huiting;CAI Renzhi;LIN Weixiao;NI Jingyi;QIAN Naisi;XIA Tian;WU Fan(Department of Health Information,Shanghai Municipal Center for Disease Control and Prevention,Shanghai 200336,China;School of Public Health,Fudan University,Shanghai 200032,China;Minhang District Center for Disease Control and Prevention,Shanghai 201101,China)
出处
《上海预防医学》
CAS
2024年第1期98-103,共6页
Shanghai Journal of Preventive Medicine
基金
国家自然科学基金项目(82003486)
上海市“科技创新行动计划”技术标准项目(22DZ2206000)
上海市卫生健康委员会卫生行业临床研究专项(20214Y0492)。