Science de la décision
Cohort 2023
HEC Montréal
Science de la décision
HEC Montréal
Doctoral's
Measuring privacy/utility tradeoffs of format-preserving strategies for data release
In this paper, we introduce a novel approach to evaluate the risk of re-identification of
individuals associated with format-preserving data release strategies, focusing on three strategies:
data minimization (i.e. through data removal using random sampling and data Shapley
values), data anonymization (i.e. through k-anonymity), and data synthesis (i.e. through CTGAN
and TVAE generative models). More precisely, our approach consists in simulating a security
game in which (1) an attacker performs singling-out attacks as outlined in data protection
regulations and (2) an evaluator scores attacks based on the linkability of records and the
information gain obtained by the attacker. In addition, we further enhance our approach by
simulating attacks as a cooperative game, in which the value of the attackers’ information
resources is determined using the Shapley value borrowed from game theory. Re-identification
Shapley value is proposed as a method to measure the level of re-identification potential of
each feature in a dataset when combined with other features. We demonstrate the effectiveness
of our approach using three datasets commonly used in the privacy literature. Overall, our
work contributes to a better understanding of the inherent trade-offs that exist between data
privacy and data utility in organizations.
Evaluating the Risk of Re-Identification in Data Release Strategies: An Attacker-Centric Approach
In this methodological paper, we introduce a novel
approach to evaluate the risk of re-identification of
individuals associated with data release strategies,
including data redaction, data anonymization and data
synthesis. More precisely, our approach simulates an
attacker performing singling-out attacks as outlined in
data protection regulations, and scores attacks based
on the linkability of records and the information gain
obtained by the attacker. Additionally, we further
enhance our approach by simulating attacks as a
cooperative game. In this game, the value of the
attackers’ information resources is determined using
Shapley value borrowed from game theory. We also
demonstrate the effectiveness of our approach using the
Adult Income Census (AIC) dataset before discussing
the economic implications associated with a privacy
breach. Our work contributes to research and practice
on the pressing need to better understand and evaluate
the inherent trade-offs that exist between data privacy
and utility in organizations