联邦学习和群智学习等协作学习技术,能够在保护数据隐私的条件下充分利用分布在各地的分布式数据深度挖掘数据中所蕴含的知识,拥有非常广阔的应用前景,尤其是在强调隐私惯例和道德约束的医疗健康领域.任何协作工作都需要选择可靠的参与...联邦学习和群智学习等协作学习技术,能够在保护数据隐私的条件下充分利用分布在各地的分布式数据深度挖掘数据中所蕴含的知识,拥有非常广阔的应用前景,尤其是在强调隐私惯例和道德约束的医疗健康领域.任何协作工作都需要选择可靠的参与方,协作学习中全局模型的性能在很大程度上取决于参与方的选择.然而,现有研究在选择参与方时都没有对不同机构医疗数据中存在的异质性加以直接关注.导致包含稳定性在内的全局模型的性能难以得到保障.提出了从信誉的角度尝试探索求解该问题.以迭代协作学习的方式尽可能挑选出具有良好信誉的参与方进行协作学习,以获得稳定可靠的高性能全局模型.首先,提出了一个描述医疗机构数据质量的AI信誉值指标AMP(AI medical promise),以帮助其在医疗领域中形成良好的AI生态.其次,建立了一个基于后向选择的迭代协作学习(colback-learning)框架.在单次协作学习任务中,利用后向选择方法以多项式时间复杂度迭代计算出性能良好且稳定的全局模型,完成AMP计算和积累.在AMP信誉值计算中,制定了一个综合考虑全局性能指标的评分函数,以针对医疗领域更有效地指导全局模型的训练.最后,使用真实医疗数据模拟多样化的协作学习场景.实验表明,colback-learning能够选择可靠参与方训练得到性能良好的全局模型,模型的性能稳定性比现有最好的参与方选择方法提高了1.3~6倍.全局模型的可解释性与集中式学习保持了较高的一致性.展开更多
Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust med...Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust medical learning model requires a large number of continuous synchronous monitoring data of patients from various types of monitoring facilities.However,the clinic monitoring data are usually sparse and imbalanced with errors and time irregularity,leading to inaccurate risk prediction results.To address this issue,this paper designs a medical data resampling and balancing scheme for federated learning to eliminate model biases caused by sample imbalance and provide accurate disease risk prediction on multi-center medical data.Experimental results on a real-world clinical database MIMIC-Ⅳ demonstrate that the proposed method can improve AUC(the area under the receiver operating characteristic) from 50.1% to 62.8%,with a significant performance improvement of accuracy from 76.8% to 82.2%,compared to a vanilla federated learning artificial neural network(ANN).Moreover,we increase the model’s tolerance for missing data from 20% to 50% compared with a stand-alone baseline model.展开更多
文摘联邦学习和群智学习等协作学习技术,能够在保护数据隐私的条件下充分利用分布在各地的分布式数据深度挖掘数据中所蕴含的知识,拥有非常广阔的应用前景,尤其是在强调隐私惯例和道德约束的医疗健康领域.任何协作工作都需要选择可靠的参与方,协作学习中全局模型的性能在很大程度上取决于参与方的选择.然而,现有研究在选择参与方时都没有对不同机构医疗数据中存在的异质性加以直接关注.导致包含稳定性在内的全局模型的性能难以得到保障.提出了从信誉的角度尝试探索求解该问题.以迭代协作学习的方式尽可能挑选出具有良好信誉的参与方进行协作学习,以获得稳定可靠的高性能全局模型.首先,提出了一个描述医疗机构数据质量的AI信誉值指标AMP(AI medical promise),以帮助其在医疗领域中形成良好的AI生态.其次,建立了一个基于后向选择的迭代协作学习(colback-learning)框架.在单次协作学习任务中,利用后向选择方法以多项式时间复杂度迭代计算出性能良好且稳定的全局模型,完成AMP计算和积累.在AMP信誉值计算中,制定了一个综合考虑全局性能指标的评分函数,以针对医疗领域更有效地指导全局模型的训练.最后,使用真实医疗数据模拟多样化的协作学习场景.实验表明,colback-learning能够选择可靠参与方训练得到性能良好的全局模型,模型的性能稳定性比现有最好的参与方选择方法提高了1.3~6倍.全局模型的可解释性与集中式学习保持了较高的一致性.
基金supported by Hubei Provincial Development and Reform Commission Program"Hubei Big Data Analysis Platform and Intelligent Service Project for Medical and Health"。
文摘Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust medical learning model requires a large number of continuous synchronous monitoring data of patients from various types of monitoring facilities.However,the clinic monitoring data are usually sparse and imbalanced with errors and time irregularity,leading to inaccurate risk prediction results.To address this issue,this paper designs a medical data resampling and balancing scheme for federated learning to eliminate model biases caused by sample imbalance and provide accurate disease risk prediction on multi-center medical data.Experimental results on a real-world clinical database MIMIC-Ⅳ demonstrate that the proposed method can improve AUC(the area under the receiver operating characteristic) from 50.1% to 62.8%,with a significant performance improvement of accuracy from 76.8% to 82.2%,compared to a vanilla federated learning artificial neural network(ANN).Moreover,we increase the model’s tolerance for missing data from 20% to 50% compared with a stand-alone baseline model.