AI Training Datasets & Article 14 GDPR: A Risk Assessment for the Proportionality Exemption of the Obligation to Provide Information

AutorIakovina Kindylidi; Inês Antas de Barros
CargoIakovina Kindylidi is an international adviser at Vieira de Almeida & Associados' ICT practice area. She holds an LL.M in International Business Law from Tilburg University and has participated as speaker in various seminars and classes on emerging technologies, with a focus on AI. E-mail: imk@vda.pt. Address: Rua Dom Luís I 28, 1200-151 Lisboa...
Páginas1-27
AI Training Datasets & Article 14 GDPR: A Risk Assessment for the Proportionality... (p. 1-27) 1
KINDYLIDI, I.; BARROS , I. A. de.
AI Traini ng Datasets & Article 14 GDPR: A Risk Assessment for the
Proportionality Exe mption of the Obligation to Provide Informatio n
.
The Law, State and Telecommunications
Review
, v. 13, no. 2, p. 1-27, October 2021.
AI Training Datasets & Article 14 GDPR: A Risk
Assessment for the Proportionality Exemption of
the Obligation to Provide Information
Submitted
: 29 January 2021
Iakovina Kindylidi*
ORCID: https://orcid.org/0000-0002-8803-1359
Inês Antas de Barros**
ORCID: https://orcid.org/0000-0002-1226-1289
DOI: https://doi.org/10.26512/lstr.v13i2.36253
Revised
: 3 March 2021
Accepted
: 1 April 20 21
Article submitted to peer blind rev iew
Licensed under a Creative Common s Attribution 4.0 International
Abstract
[Purpose]
At the earliest stages in AI lifecycle, training, verification and validation of
machine learning and deep learning algorithm require vast datasets that usually contain
personal da ta, which however is not obtained d irectly from the data subject, while very
often the controller is not in a position to identify the data subjects or such identification
may result to disproportionate effort. This situation raises the question on how the
controller can comply with its obligation to provide information for the processing to the
data subjects, especially when proving the information notice is impossible or requires a
disproportionate effort. There is little to no guidance on the matter. The purpose of this
paper is to addre ss this gap by designing a clear risk-assessment methodology that can be
followed by controllers when providing information to the data subjects is impossible or
requires a disproportionate effort.
[Methodology]
After examining the scope of the transparency principle, Article 14 and its
proportionality exemption in the training and verification stage of machine learning and
deep learning algorithms following a doctrinal analysis, we assess whether already existing
tools and methodologies ca n be adapted to accommod ate the GDPR re quirement of
carrying a balancing test, in conjunction with, or independently of a DPIA.
[Findings]
Based on an interdisciplinary analysis, comprising theoretical and descriptive
material from a legal and technological point of view, we propose a risk-assessment
methodology as well as a series of risk-mitigating measures to ensure the protection of the
data subject's r ights and legitimate intere sts while fostering the up take of the technology.
[Practical Implications]
The proposed balancing exercise and additional measures are
designed to facilitate entities training or developing AI, especially SMEs, within and
*
Iakovina Kindylidi is an international adviser at Vieira de Almeida & Associados’ ICT
practice area. She holds an LL.M in International Busine ss Law from Tilburg University
and has participated as speaker in var ious seminars and classes on emerging technologies,
with a focus on AI. E-mail: imk@vda.pt. Address: Rua Dom L uís I 28, 1200-151 Lisboa.
**
Inês Antas de Barros is a managing associate at Vieira de Almeida & Associados’ ICT
practice are a. She holds an LL .M in International Business Law from Global School of
Law of the Catholic University of Portugal. She has participated as a speaker in various
seminars and classes on pr ivacy, data prote ction, and cyber security. E-mail: iab@vda.pt.
2
AI Training Datasets & Article 14 GDPR: A Risk Assessment for the... (p. 1-27)
KINDYLIDI, I.; BARROS, I. A. de.
AI Traini ng Datasets & Article 14 GDPR: A Risk Assessment for the
Proportionality Exe mption of the Obligation to Provide Informatio n
.
The Law, State and Telecommunications
Review
, v. 13, no. 2, p. 1-27, October 2021.
outside of the EEA, that wish to en sure and showcase the data protection compliance of
their AI-based solutions.
Keywords
: AI. GDPR. Article 14. Risk- Assessment. Transparency.
INTRODUCTION
Artificial Intelligence (“AI”) and Big Data are topics of high priority for
the European Commission as it recently published its holistic regulatory proposal
on AI
1
, while further promoting data sharing in the EU with the creation of the
European data spaces in k ey sectors
2
. As algorithms and AI -related technologies
are fueled by data, one of the fundamental concerns of academics and regulators
has been the data protection of individuals.
Although the data protection and privacy risks stemming from the use of
AI are manifold and undoubtfully crucial, legal scholars are focusing more on
how these risks can be avoided - preventively or punitively than on how can
entities proactively build GDPR-compliant AI.
At an earlier stage in the AI lifecycle large datasets which usually include
personal data are “fed” to machine learning
3
and deep learning
4
algorithms to
support their training and functions as well as to test the AI’s behavior in the
subsequent stages of verification and validation. This personal data is usually not
obtained directly by the data subjects but via th ird parties, private or publicly
available sources, and even if the data subject is identifiable, the process to
identify them may be too difficult and, most of the time, of no interest to the entity
training the algorithm
5
. Nonetheless, this data, even if it may not directly identify
a data subject, when related to other datasets may lead to the identification of a
person.
Furthermore, in relation to the algorithm training stage, although due to
several incid ents there is conversation around AI-biases and the importance of
data quality to mitigate such errors and fostering fundamental rights, there is little
discussion on how, in practical terms, can an entity use these datasets to train and
1
Proposa l for a Regulation of the Eu ropean Par liament and of the Council Lay ing Down
Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) And Amending
Certain Union Legislat ive Acts COM/2021/206.
2
Proposal for a Reg ulation Of The European Parliamen t And Of The Council on European
data governance (Data Governance Act) COM/2020/767.
3
Machine learning, is the ability of an algorithm to improve its performance autonomously,
based on newly acquired informa tion and experience.
4
Deep learning is a subset of machine learning. Deep learning enables the algorithm to
make decisions through data processing and creation of patterns.
5
AEPD is referring to this data as “quasi-identifier s”, also mentioned as pse udo-identifiers
or indirect identifies. See amongst others (AEPD, 2021, p. 33).

Para continuar a ler

PEÇA SUA AVALIAÇÃO

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT