The European Data Protection Board’s recent opinion on AI models can be useful in several ways.

Last week, I covered EDPB’s take on what the consequences could be for the unlawful processing of personal data in the development phase of an AI model.

This week, I analyze a simple question: How legitimate is your interest?

Legitimate:

  • Development of AI model, can be legitimate. (e.g conversational agent to assist users; detect fraudulent content or behavior; improving threat detection.
  • Make sure your interest is: lawful; clearly & precisely articulated; real & present (i.e. not speculative).

Necessity:

  • Depends on amount of data & whether proportionate to pursue the interest, in light of data minimization. (Can you develop without personal data? Use less data?)
  • Less intrusive means possibility depends on direct relationship with data subjects.
  • Measures to reduce ease of identification (even if not fully anonymized).

Balancing:

  • Conduct test & consider publishing analysis.

Consider interests

  • Development: Self-determination & retaining control over one’s own personal data.
  • Deployment: Retaining control over one’s own personal data, financial interests; personal benefits or socioeconomic interests.

Consider rights to privacy and family; freedom of expression; discrimination

  • Development: Scraped against wishes or without knowledge.
  • Deployment: Check if possible to infer, accidentally or by attacks what personal data is contained in the learning database.

Assess: [For both the training data and the deployment data]

  • Nature of the data processed – e.g. highly private.
  • Context of processing (intended operational uses; whether combined with other datasets; overall volume of data/individuals affects etc.).
  • Further consequences of processing (risks of violation of fundamental rights) and likelihood of further consequences materializing.
  • Reasonable expectations of individuals.

Development:

  • Just because information relating to the development phase is included in the controller’s privacy policy, does not necessarily mean it’s reasonably expected.
  • Consider: Was data publicly available, nature of relationship with controller, nature of service, context & source, potential further uses of model, are people aware that their data is online.

Deployment:

  • Consider relationship with controller; context of model’s specific capabilities: awareness of individuals, impact only users or used to improve whole service.

Mitigating Measures: (on top of normal GDPR compliance)

Development:

  • Technical measures to reduce identifiability.
  • Measures to facilitate exercise of rights (grace period before use; opt-out; right to delete.
  • Transparency: beyond information required; alternative forms (media; email; graphics).
  • Web scraping: (exclude content/ categories/ robots.txt).

Deployment:

  • Technical: prevent storage, output filters, digital watermarking.
  • Individual right: Erasure, suppression.
  • Web scraping: Careful as it may lead to significant impacts on individuals.