The European Data Protection Board’s recent opinion on AI models can be useful in several ways.
Last week, I covered EDPB’s take on what the consequences could be for the unlawful processing of personal data in the development phase of an AI model.
This week, I analyze a simple question: How legitimate is your interest?
Legitimate:
- Development of AI model, can be legitimate. (e.g conversational agent to assist users; detect fraudulent content or behavior; improving threat detection.
- Make sure your interest is: lawful; clearly & precisely articulated; real & present (i.e. not speculative).
Necessity:
- Depends on amount of data & whether proportionate to pursue the interest, in light of data minimization. (Can you develop without personal data? Use less data?)
- Less intrusive means possibility depends on direct relationship with data subjects.
- Measures to reduce ease of identification (even if not fully anonymized).
Balancing:
- Conduct test & consider publishing analysis.
Consider interests
- Development: Self-determination & retaining control over one’s own personal data.
- Deployment: Retaining control over one’s own personal data, financial interests; personal benefits or socioeconomic interests.
Consider rights to privacy and family; freedom of expression; discrimination
- Development: Scraped against wishes or without knowledge.
- Deployment: Check if possible to infer, accidentally or by attacks what personal data is contained in the learning database.
Assess: [For both the training data and the deployment data]
- Nature of the data processed – e.g. highly private.
- Context of processing (intended operational uses; whether combined with other datasets; overall volume of data/individuals affects etc.).
- Further consequences of processing (risks of violation of fundamental rights) and likelihood of further consequences materializing.
- Reasonable expectations of individuals.
Development:
- Just because information relating to the development phase is included in the controller’s privacy policy, does not necessarily mean it’s reasonably expected.
- Consider: Was data publicly available, nature of relationship with controller, nature of service, context & source, potential further uses of model, are people aware that their data is online.
Deployment:
- Consider relationship with controller; context of model’s specific capabilities: awareness of individuals, impact only users or used to improve whole service.
Mitigating Measures: (on top of normal GDPR compliance)
Development:
- Technical measures to reduce identifiability.
- Measures to facilitate exercise of rights (grace period before use; opt-out; right to delete.
- Transparency: beyond information required; alternative forms (media; email; graphics).
- Web scraping: (exclude content/ categories/ robots.txt).
Deployment:
- Technical: prevent storage, output filters, digital watermarking.
- Individual right: Erasure, suppression.
- Web scraping: Careful as it may lead to significant impacts on individuals.