“Whenever we make a call, go to work, search the web, pay with our credit card, we generate data. While de-identification might have worked in the past, it doesn’t really scale to the type of large-scale datasets being collected today.”
It turns out that ” four random points (i.e. time and location where a person has been) are enough to uniquely identify someone 95 percent of the time in a dataset with 1.5 million individuals…”
All these results lead to the conclusion that an efficient enough, yet general, anonymization method is extremely unlikely to exist for high-dimensional data — say Y.A. de Montjoye and A. Gadotti.
They suggest we need to “move beyond de-identification and start using modern solutions to unlock the huge potential of data for social good and economic development. Else, we risk being stuck in the false dichotomy that we have either innovation or privacy.”