Artificial intelligence (AI) and personal data

Jure Globocnik

Jure Globocnik

Guest author from activeMind AG

In recent months, the release of several artificial intelligence (AI)-based tools has caused quite a stir. Among the best known of these are the text processing tool ChatGPT and the image processing tool DALL-E-2 from the company OpenAI.

After initial euphoria, many companies are now asking themselves whether and to what extent these tools can also be used in everyday business. We illustrate what you should consider from a data protection perspective when using AI.

Update, June 2023

Meanwhile, data protection supervisory authorities in the EU are particularly concerned with OpenAI and ChatGPT. More on the discussion and on AI-related data breaches that have come to light in the meantime can be found further down in the article.

How do AI-based systems work?

In order to assess the compatibility of AI-supported software with the requirements of the General Data Protection Regulation (GDPR), we first need to know the basics of how it works.

AI-supported systems solve certain tasks without human intervention. For example, ChatGPT can answer questions, write articles and summarise or translate texts. This means that many application scenarios are conceivable in the corporate context, among others:

  • Customer support: Companies could use AI-powered systems to automatically answer customer queries and provide information about their products and services, for example as chatbots on websites or in messaging apps.
  • Virtual assistants: Companies could also use AI-powered systems internally to streamline processes. For example, they could help manage tasks, make reservations or even check contracts.
  • Content generation: Furthermore, companies could use AI to generate specific content such as draft emails, blog posts with corresponding images, text summaries and translations.

In order to be able to take on such tasks, AI models are extensively trained. As a rule, publicly available information, such as that from the internet, is first used for this. Since this information can also be false, racist, homophobic, etc., there is a risk that the outputs, i.e. the content generated by an AI system, will also have these characteristics. In addition, outputs can also contain personal data, which in turn has to be assessed in terms of data protection law and could pose problems for companies if not all the requirements of the GDPR are met.

Usually, however, the development of an AI system does not end when it is put on the market. Rather, it is continuously developed further thereafter. For this training, the data generated during the use of the system (inputs and outputs) are usually also used. Such further processing of data for the provider’s own purposes poses a major challenge in terms of data protection law.

What must be observed in terms of data protection law when using AI-supported systems?

If a company wants to use an open AI-supported system for the processing of personal data in particular, it must observe a large number of data protection requirements. Since these are new technologies for which there are hardly any court decisions or guidelines from supervisory authorities, their use is often associated with a certain risk. Legal aspects should therefore be thoroughly discussed before using such technologies in a corporate context.

In the following, we will discuss the most important data protection issues. Due to the wide range of possible applications, the discussion does not claim to be exhaustive; rather, it is only intended to outline approaches to the most important problems. Of course, this cannot replace an application-specific review by a data protection expert.

First of all, the question arises as to the data protection status of the companies involved in the data processing.

The company that integrates an AI-based system into its own processes will generally be classified as the controller, as it decides on the purposes and means of data processing.

More exciting is the question of the role of the AI system provider. The actual data processing usually takes place on servers of the providers. Therefore, it is initially obvious that the AI provider should be a data processor, because it processes the data on behalf of and on the instructions of the company that uses the system. In such a case, a contract for processing would have to be concluded. Some providers also offer the conclusion of a contract for commissioned data processing (e.g. OpenAI for the use of ChatGPT).

Before concluding a contract for the processing of data, it must be confirmed that the provider can comply with the assurances contained therein. This is not obvious or trivial; certain aspects of the AI model cannot be influenced even by the provider itself (this is why AI models are often referred to as black boxes). It is also important to ensure that the processing to be provided in terms of the contract is described sufficiently precisely in the contract.

As many providers use the data collected through the use of AI systems for the further development of their AI models, the question arises whether this is compatible with their position as processors. Following the guidelines of the French supervisory authority (CNIL), this is only possible with the consent of the client, whereby the provider would be considered the controller for this processing.

Companies are well advised not to give such consent or to make use of a possible opt-out option (this is available to commercial users, for example, with ChatGPT). The reason for this is that the further use of data for the provider’s own purposes in the AI context goes beyond what would be the case, for example, with a conventional IT provider. When training AI models, the data used for this purpose can become part of the model, which in turn means that it could be disclosed to further users of the respective AI application and that its complete deletion becomes nearly impossible. The company would thus no longer be in a position to control the dissemination of the information or even to record it correctly. In this case, it would also become difficult, if not impossible, to provide sufficient information to those affected.

Finally, it is also possible that the company using the AI application and the company providing it are jointly responsible for the processing. This could be the case, for example, if both companies work closely together on the development of the AI and have common purposes, or if their processing operations are otherwise inextricably linked. In the event that the AI is to be trained with the data fed into it, this will regularly be the case. This is because the processing is then also carried out for the purposes of the AI provider.

Companies must ensure that there is a valid legal basis for the processing.

Although this depends on the particular data processing, often no other legal basis than consent of the data subject will come into consideration.

Legitimate interests will often not be relevant, as the risks associated with processing by means of AI are often high and thus unlikely to outweigh the interests of data subjects. This must of course be examined on a case by case basis.

Pursuant to Art. 13 GDPR, companies must inform data subjects about the processing of their data. In addition to the usual information that must always be provided, information on automated decision-making in individual cases pursuant to Art. 22 GDPR must also be provided. Data subjects should be informed in a meaningful way about the logic involved and the scope and intended effects of such processing.

Companies must also ensure through appropriate processes that data subjects can exercise their rights under the GDPR, for example that data is corrected if the outputs of the AI application do not correspond to reality. Whether data subject rights can always be fully exercised in the AI context is doubtful. After the data has been incorporated into the AI model and has changed it, it may be difficult to completely delete the data without affecting the model.

In addition to the usual data subject rights, in some subcategories of automated decision-making under Art. 22 GDPR, data subjects also have the right to have a natural person intervene on the part of the controller, to express their own point of view and to contest the decision. This is also likely to be difficult or impossible to implement because it contradicts the basic idea of artificial intelligence.

In addition to the above, data controllers have numerous other obligations under data protection law, in particular:

  • Privacy by Design and Privacy by Default: The company must design the application in such a way that data protection principles, such as data minimisation, are effectively implemented. This should not only be taken into account during the use of the application, but also during its development.
  • Data protection impact assessment (DPIA): It will often be necessary to conduct a data protection impact assessment in advance of data processing using AI. According to the blacklist of the German Data Protection Conference, the performance of a data protection impact assessment is mandatory, among other things, when AI is used to process personal data to control interaction with data subjects or to assess personal aspects of the data subject.
  • Technical and organisational measures: Companies should ensure sufficient technical and organisational measures (TOM) to ensure the confidentiality, availability and integrity of data.
  • Data protection officer: Finally, companies should check whether the appointment of a data protection officer is mandatory due to AI-supported processing.

Alternative: No processing of personal or other sensitive data by means of AI

As the analysis has shown, companies that wish to use AI must fulfil numerous requirements of the GDPR, which are likely to cause considerable effort, insofar as their fulfilment is possible at all within a specific use case.

To avoid this, companies may decide not to process personal or other sensitive information – such as trade secrets – using the relevant AI application, for example by removing such data in advance from the text to be translated using AI. In such a case, the GDPR would not apply and the company would not have to comply with the requirements described above. Of course, this is not an option for some application scenarios, such as a chatbot that should interact with customers in the area of customer support. In such a case, use without the processing of personal data is not possible.

OpenAI in the focus of regulatory authorities

Italian regulator

In March 2023, the Italian supervisory authority Garante provisionally banned OpenAI, which operates ChatGPT, from processing the data of Italian users. The authority had doubts about the existence of an effective legal basis for processing personal data for the purpose of training the algorithm and criticized the fact that information obligations were not fully met. In some cases, ChatGPT also processed incorrect personal data.

The fact that the age of users was not checked (according to the company’s general terms and conditions, use is only permitted from the age of 13) also played a role, as this meant that children were possibly shown answers that were inappropriate for their age.

In the meantime ChatGPT is available again in Italy. For this, OpenAI had to fulfill various requirements of the Garante:

  • For example, an adapted privacy policy had to be made available on the ChatGPT website, which, among other things, informs about the logic of the data processing and about data subject rights.
  • OpenAI may no longer rely on Art. 6(1) (b) GDPR (performance of the contract) as a legal basis, but must instead rely on the data subject’s consent or legitimate interests.
  • The company must implement mechanisms by which data subjects (both ChatGPT users and non-users) can request correction and deletion of their data or object to the use of their data.
  • Furthermore, OpenAI needs to better protect minors. In this regard, all that needs to be done initially is to obtain confirmation from the user that he or she is of legal age. However, by September 2023, the ChatGPT operator must implement an age verification system to filter out users under the age of 13, as well as users between the ages of 13 and 18 who do not have parental consent.
  • Finally, in consultation with the Garante, OpenAI has to conduct an information campaign through various channels (radio, television, newspapers, Internet) to inform citizens about the use of their personal data for training algorithms.

The above requirements will initially apply only to ChatGPT users from Italy.

German supervisory authorities

The German Data Protection Conference (DSK) established the AI Taskforce in April 2023 to address the issue. The taskforce is to undertake a coordinated data protection review of ChatGPT.

In a first step, the German authorities contacted OpenAI with a questionnaire in order to obtain additional information, for example, about the data sources and algorithms behind the automated data processing. The deadline for answering the questions has already been extended once, so OpenAI is expected to submit its answers soon. On this basis, the German authorities will then conduct a data protection review of ChatGPT.

EU-wide coordination

As OpenAI does not have an establishment in the EU, the supervisory authorities of all member states are responsible for monitoring OpenAI’s compliance with the GDPR in their respective local jurisdictions.

In order to promote and coordinate cooperation and exchange of information on possible actions by authorities at the EU level, the European Data Protection Board (EDPB) established a task force in April 2023 to address ChatGPT.

ChatGPT data breaches

In March 2023, OpenAI experienced its first data breach. This involved some ChatGPT users being able to view the chat histories of other users.

In June 2023, it became known that a malware that taps into data stored in the browser, such as login credentials stored in the browser, was able to collect login credentials from more than 100,000 ChatGPT users. These could now be at increased risk of fraud and cyber attacks. Although credentials for other online services were also collected by the malware alongside ChatGPT, experts estimate that ChatGPT is of particular importance to cybercriminals because chat histories at ChatGPT often contain highly sensitive data. In addition, ChatGPT does not automatically delete chat history. Rather, the user has to delete the data manually. Therefore, to minimize your own risks, it is recommended to delete the chat history regularly.

Conclusion: AI use is the next big challenge

The use of AI-based systems poses significant data protection challenges for companies. As this brief analysis has shown, companies may also have to comply with requirements with which they may not yet have come into contact, such as carrying out a data protection impact assessment. If such systems are used purely privately within the framework of the so-called household exception, these requirements do not apply.

Companies should also consider requirements arising from other areas of law, such as copyright or liability law. The AI Regulation, which is currently being negotiated at the EU level, could also impose further requirements on companies.

What applies under data protection law in a specific case always depends on the respective application scenario. This analysis is only intended to roughly highlight possible problems. In order to make the use of an AI-supported system in compliance with the GDPR, companies should definitely seek the advice of a data protection expert before introducing it.

AI Compliance

Reach legal certainty for the development and implementation of artificial intelligence in your company.

Contact us!

Secure the knowledge of our experts!

Subscribe to our free newsletter: