Client-side secret redaction for LLM prompts (LeakGuard MVP)
I’ve been working on a Chrome extension that acts as a client-side privacy layer for LLM usage.
The idea: Detect likely secrets in the prompt before it’s sent, replace them with local placeholders (e.g. [PWM_1]), and ensure only redacted data leaves the browser.
What’s currently working:
deterministic mapping (same secret → same placeholder)
idempotent behavior (already-redacted input stays unchanged)
mixed input handling (raw + placeholder in same prompt)
detection of common patterns (API keys, tokens, JWTs, connection strings, etc.)
verified via DevTools that outbound payloads contain only placeholders
This is not meant to be “perfect security,” but a safety layer to reduce accidental leakage during day-to-day LLM usage.
What I’m looking for:
where would you try to break this?
what edge cases am I missing?
how would you approach unknown secret detection (entropy vs context)?
Repo: you can find it in github with name LeakGuard
Discussion in the ATmosphere