What we learned building a privacy-first layer for LLMs
Hugging Face Forums [Unofficial]
April 28, 2026
Hi everyone
After experimenting with PII anonymization pipelines, we started building a more structured approach to using LLMs with sensitive data.
A few things that surprised us:
* Naive regex + NER breaks quickly at scale
* Context loss can hurt model outputs more than expected
* Re-identification pipelines get tricky in multi-step workflows
We ended up moving toward a design where:
* sensitive data is abstracted before inference
* mappings are handled separately
* models never see raw PII
Curious how others are approaching this—especially in production settings.
Discussion in the ATmosphere