Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The Future of Machine Learning Data Practices and Repositories

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data

Henrik Nolte · Michèle Finck · Kristof Meding


Abstract:

Does GPT know you? The answer depends on your level of public recognition; however, if your information was available on a website, the answer is probably yes. All Large Language Models (LLMs) memorize training data to some extent. If an LLM training corpus includes personal data, it also memorizes personal data. The development of an LLM processes personal data in a way that enables the identification of natural persons and thereby falls directly within the scope of data protection laws. If a personal reference is established, the implications are far-reaching: the AI system would remain subject to EU General Data Protection Regulation requirements even after the training phase has concluded. To back our arguments: (1.) We reiterate that LLMs output training data at inference time, be it verbatim or in generalized form. (2.) We show that some LLMs can thus be considered personal data on their own. This triggers a cascade of data subject rights, such as access, rectification, or erasure, which would extend to the information embedded within the model. (3.) This paper argues that machine learning researchers must acknowledge the legal implications of LLMs as personal data throughout the full ML development lifecycle, from data collection and curation to model provision on, e.g., GitHub or Hugging Face. (4.) We propose different ways for the ML research community to deal with these legal implications. Our paper serves as a starting point for improving the alignment between law and the technical capabilities of LLMs. Our findings underscore the need for more interaction between the legal domain and the ML community during model development and inference time. The code used to support our view can be found at \url{https://github.com/LLMsStorePersonalData/CodeLLMsStorePersonalData}.

Chat is not available.