Guide
Scoping
In this guide, as well as in the accompanying decision tree, the focus is on application of PETs in the context of inter-organizational data analysis – i.e., multiple organizations (which could also be part of one larger organization) aim to perform joint analyses on their data sets. For this purpose, we focus on a relevant subset of PETs in our decision tree, see Appendix A for more details.
These PETs are:
- Federated Learning
- Secure Multi-Party Computation (divided into homomorphic encryption and secret sharing)
- Trusted Secure Environments (or Trusted Execution Environments)
- Differential Privacy
This does not mean that other PETs are not relevant, but we feel these are several of the more important categories when considering the purpose of inter-organizational data analysis. Furthermore, next to privacy of individuals there can be other reasons to apply these PETs, e.g., when dealing with commercially confidential data – these other possible applications are also in scope.
When considering protection of information in data analysis, there are two views of privacy, namely:
- Input Privacy
- Output Privacy
Input Privacy refers to the process of keeping the input data to a computation private. The necessity of input privacy is self-evident in the context of sharing sensitive data, as lack thereof is equivalent to sharing raw data among parties, which defeats the purpose. On the other hand, Output Privacy refers to techniques used to ensure that the output of a computation does not reveal information about the input data. Preserving output privacy is not always trivial. In a recent case, the US Census Bureau showcased that they were able to use their own publicly released statistics to reconstruct the raw data of the individuals described by these statistics. This result motivated the Bureau to incorporate Differential Privacy in their computations to achieve output security.
In this document, we focus primarily on input privacy. We do include one state-of-the-art output security technique in our list of PETs, namely Differential Privacy. However, we should stress that there are other disclosure avoidance techniques that detect whether the output of an analysis leaks information about the input data. These other output privacy techniques are out of scope.
Several times we refer to ‘sensitive data’ in both this document as well as the decision tree. This is meant in a broad setting: data which may be sensitive due to many reasons (privacy, commercial confidentiality, etc.). This is a broader definition than the term ‘sensitive personal data’ used in the GDPR, where it refers to specific categories of personal data (medical, ethnic background, etc.).