This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed case...Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed cases while the outbreak in other countries has led to their healthcare systems breakdown. In this work, we introduce an efficient framework called COMAP (COrona MAP), aiming to study and predict the behavior of COVID-19 based on deep learning techniques. COMAP consists of two stages: clustering and prediction. The first stage proposes a new algorithm called Co-means, allowing to group countries having similar behavior of COVID-19 into clusters. The second stage predicts the outbreak’s growth by introducing two adopted versions of LSTM and Prophet applied at country and continent scales. The simulations conducted on the data collected by WHO demonstrated the efficiency of COMAP in terms of returning accurate clustering and predictions.展开更多
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
文摘Recently, governments and public authorities in most countries had to face the outbreak of COVID-19 by adopting a set of policies. Consequently, some countries have succeeded in minimizing the number of confirmed cases while the outbreak in other countries has led to their healthcare systems breakdown. In this work, we introduce an efficient framework called COMAP (COrona MAP), aiming to study and predict the behavior of COVID-19 based on deep learning techniques. COMAP consists of two stages: clustering and prediction. The first stage proposes a new algorithm called Co-means, allowing to group countries having similar behavior of COVID-19 into clusters. The second stage predicts the outbreak’s growth by introducing two adopted versions of LSTM and Prophet applied at country and continent scales. The simulations conducted on the data collected by WHO demonstrated the efficiency of COMAP in terms of returning accurate clustering and predictions.