image

Patrick Mesana

Science de la décision
Cohort 2023
HEC Montréal


Program of study

Science de la décision

University

HEC Montréal

Academic degree

Doctoral's


Outputs


Measuring privacy/utility tradeoffs of format-preserving strategies for data release

In this paper, we introduce a novel approach to evaluate the risk of re-identification of individuals associated with format-preserving data release strategies, focusing on three strategies: data minimization (i.e. through data removal using random sampling and data Shapley values), data anonymization (i.e. through k-anonymity), and data synthesis (i.e. through CTGAN and TVAE generative models). More precisely, our approach consists in simulating a security game in which (1) an attacker performs singling-out attacks as outlined in data protection regulations and (2) an evaluator scores attacks based on the linkability of records and the information gain obtained by the attacker. In addition, we further enhance our approach by simulating attacks as a cooperative game, in which the value of the attackers’ information resources is determined using the Shapley value borrowed from game theory. Re-identification Shapley value is proposed as a method to measure the level of re-identification potential of each feature in a dataset when combined with other features. We demonstrate the effectiveness of our approach using three datasets commonly used in the privacy literature. Overall, our work contributes to a better understanding of the inherent trade-offs that exist between data privacy and data utility in organizations.

Evaluating the Risk of Re-Identification in Data Release Strategies: An Attacker-Centric Approach

In this methodological paper, we introduce a novel approach to evaluate the risk of re-identification of individuals associated with data release strategies, including data redaction, data anonymization and data synthesis. More precisely, our approach simulates an attacker performing singling-out attacks as outlined in data protection regulations, and scores attacks based on the linkability of records and the information gain obtained by the attacker. Additionally, we further enhance our approach by simulating attacks as a cooperative game. In this game, the value of the attackers’ information resources is determined using Shapley value borrowed from game theory. We also demonstrate the effectiveness of our approach using the Adult Income Census (AIC) dataset before discussing the economic implications associated with a privacy breach. Our work contributes to research and practice on the pressing need to better understand and evaluate the inherent trade-offs that exist between data privacy and utility in organizations


image image image image image image image image