reFlow: A Feature-Decoupled Transformer with Native Interpretability
Hugging Face Forums [Unofficial]
March 19, 2026
Interesting work — the “interpretability as load-bearing structure” framing resonates.
I’m working on a related but orthogonal problem at Prooftrail: instead of making the representation space readable, we’re trying to extract a real-time feedback signal from hidden states during generation — a non-learned coherence metric (cosine similarity over time at a fixed layer) that detects when the model is looping or stagnating, without any trained probe or labels.
Your crystallization boundary finding (L12–L18) is directly relevant to us. We monitor at Layer 27 (Qwen 7B, 32 layers) because the signal is clearest there — but your result suggests that if we ever want to intervene (not just monitor), we’d need to act earlier, in the zone where semantic decisions are still fluid. That’s a concrete design constraint we hadn’t formalized.
Two questions:
1. Did you measure whether the crystallization boundary shifts with task type (e.g., factual recall vs. multi-step reasoning), or is it stable across your evaluation suite?
2. The hard-sparsity result (top-64 destroying semantics) is striking. Have you looked at whether soft gating (learned attention over signals rather than hard top-k) preserves structure while still giving you a compact active set?
Our working paper is on Zenodo (DOI: 10.5281/zenodo.18941566) and the interim data is on HuggingFace (airVen/missing-value-function-interim-report) — different angle, but the shared conviction that architectural constraints beat post-hoc analysis seems worth connecting.
Discussion in the ATmosphere