Baldur Bjarnason

... works as a web developer in Hveragerði, Iceland, and writes about the web, digital publishing, and web/product development

These are his notes

“Scalable Extraction of Training Data from (Production) Language Models”

We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT

Surprise!