Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihdjxv25tcsqrebg5rraskrdz56xgwociwiyqdqgxotplhdlcilzm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmmqhayua4h2"
  },
  "path": "/t/geometric-dynamics-of-llms-mapping-internal-stability-regimes-methodological-audits-qwen-llama-gemma/176201#post_1",
  "publishedAt": "2026-05-24T18:56:41.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Four Dynamical Regimes in large Language Models : An Empirical Phase Map",
    "Conditional Dynamic Signatures in Large Language Models :"
  ],
  "textContent": "I’m sharing two working papers on the internal geometric dynamics of Large Language Models, focusing on trajectory instability, regime classification, and methodological robustness.\nWhile most evaluations focus on output metrics (perplexity, benchmarks), this work investigates the hidden state trajectories during autoregressive generation to understand how models process information internally.\nPaper 1: Four Dynamical Regimes in LLMs\nLink: Four Dynamical Regimes in large Language Models : An Empirical Phase Map\nWe introduce ct_t, a token-level instability metric, and identify four consistent dynamical regimes across 10 models:\nUnderactive: Rigid, low variance (e.g., TinyLlama).\nAdaptive: Balanced flux/stability (e.g., Qwen-2.5 family).\nTransition: Boundary zone.\nChaotic: High instability spikes (e.g., Gemma-2B, GPT-2).\nKey Finding: Qwen models consistently maintain an “Adaptive” regime, suggesting a structural balance between flexibility and stability that correlates with their robust performance in fine-tuning and reasoning tasks.\nPaper 2: Methodological Audit of Trajectory Instability\nLink: Conditional Dynamic Signatures in Large Language Models :\nA large-scale audit (n=17 models) examining the conditional nature of these measurements. We document:\nNormalisation Sensitivity: How CLIPPED_MAD vs RAW impacts regime discrimination.\nVariance Decomposition: Architecture explains only 7% of variance, while prompt category explains 17%, and residual noise 76%.\nFragmented Topology: At scale (n=17), clean model “families” dissolve into a continuous spectrum with only two robust local pairs (OPT-Pythia, Phi-Qwen).\nDocumented Falsifications: Six small-panel hypotheses that did not survive scale-up, reported explicitly to support methodological discipline.\nKey Insights for the Community\nStability is Conditional: A model’s dynamic regime depends heavily on the prompt category (e.g., scientific reasoning amplifies architectural differences).\nCollapse is Cyclical: We observe a robust COLLAPSE-RIVALRY cycle where models exit low-entropy states 84% of the time, suggesting a self-correcting mechanism rather than permanent failure.\nDynamic ≠ Semantic: Our control audits (V19) show that while we can influence dynamic stability via hidden-state interventions, this does not guarantee semantic correction. Dynamic control is necessary but insufficient for output quality.\nCode & Data\nAll data, metrics, and reference implementations are available in the associated Zenodo repositories. We welcome feedback, replication attempts, and discussions on extending these metrics to larger (7B+) or MoE architectures.\nBest regards,\nJean-Denis Bosange Batuli\nCEO – IDChain SRL (Unbind)",
  "title": "Geometric Dynamics of LLMs: Mapping Internal Stability Regimes & Methodological Audits (Qwen, Llama, Gemma)"
}