External Publication
Visit Post

I built ARSENIC - a tool to analyse what actually changes when you upgrade models

OpenAI Developer Community May 18, 2026
Source
Your questionnaire approach for probing reasoning stability and ethics operationalisation is interesting, a different layer about how the model thinks about itself than how its outputs change. The combination you’re describing — structured introspection probes alongside output-level behavioural comparison — would be genuinely useful. ARSENIC’s probe format is TOML and straightforward to extend, so it wouldn’t be hard to add a set of diagnostic probes alongside the standard suite and seeing as arsenic does see answer differences as it runs and validates it’s certainly worth exploring. Thanks for the comment!

Discussion in the ATmosphere

Loading comments...