Baldur Bjarnason

... works as a web developer in Hveragerði, Iceland, and writes about the web, digital publishing, and web/product development

These are his notes

About Archive Main Site Replies

April 26, 2024

“Scalable Extraction of Training Data from (Production) Language Models”

We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT

Surprise!