External Publication

Are There Any Good Benchmarks Comparing OpenAI API Models?

OpenAI Developer Community June 19, 2026

Mark_Mcdoe:

the answer is putting together the LLM with something like Lean 4

If you are not already a member of the Lean 4 Zulip forum, it is worth joining. It is free, and the Lean community is in a much better position to help with questions about combining LLMs with Lean 4.

Mark_Mcdoe:

I think standard LLMs fail at raw calculation

What do you mean by “raw calculation” here?

Do you mean arithmetic computation, symbolic manipulation, multi-step mathematical derivations, formal proof construction, or something else? Those are different tasks, and LLMs tend to fail in different ways depending on which one you mean.

Discussion in the ATmosphere