Measuring hallucinations in a RAG pipeline
OpenAI Developer Community
April 3, 2026
Great discussion. I recently built a lightweight open source library
that addresses exactly this — HallucinationBench uses GPT-4o-mini as
a structured judge to classify individual claims as grounded or
hallucinated, returning a faithfulness score and a verdict of
PASS / WARN / FAIL.
It requires no embeddings, no vector DB, no infrastructure — just
pip install hallucinationbench and two lines of code.
Happy to discuss the judge prompt design if anyone is interested.
Discussion in the ATmosphere