
“Like so many things in our profession, the answer to the question of whether aggregation is safe is ‘it depends’.”
“The challenge is that all these numerical forms of aggregation can be used to reconstruct the original data, something that has come to be known as the database reconstruction theorem. Generally speaking, the more statistics produced from the same underlying data, the more likely it is that the underlying data can be reconstructed from those statistics” writes Luk Arbuckle for the IAPP – International Association of Privacy Professionals.
Some guidelines:
- For non-complex data outputs, such as counts, provide only a standard set of attributes (e.g., region, sex, age), aggregated by those attributes, with no accompanying summary statistics produced from the underlying data (so that there’s less risk of overlap that may reveal the underlying counts), and for a specific reporting period with no overlap from previous reporting periods.
- Only share or release data with those people who need it for approved purposes and in suitably protected data environments with appropriate technical and organizational controls.