Outlier and collapse: The enron corpus and foundation model training data
beSpacific – Accurate, Focused Research on Law, Technology and …
February 9, 2026
Zimmer, Z. (2026). Outlier and collapse: The enron corpus and foundation model training data. Big Data & Society, 13(1). https://doi.org/10.1177/20539517261421474 (Original work published 2026) – “The Enron Corpus is a canonical training dataset representing one of the first scale jumps in the size of natural language data for machine learning (ML) research. That corpus was ...
Discussion in the ATmosphere