External Publication
Visit Post

Outlier and collapse: The enron corpus and foundation model training data

beSpacific – Accurate, Focused Research on Law, Technology and … February 9, 2026
Source
Zimmer, Z. (2026). Outlier and collapse: The enron corpus and foundation model training data. Big Data & Society, 13(1). https://doi.org/10.1177/20539517261421474 (Original work published 2026) – “The Enron Corpus is a canonical training dataset representing one of the first scale jumps in the size of natural language data for machine learning (ML) research. That corpus was ...

Discussion in the ATmosphere

Loading comments...