External Publication

Measuring hallucinations in a RAG pipeline

OpenAI Developer Community April 3, 2026

Great discussion. I recently built a lightweight open source library that addresses exactly this — HallucinationBench uses GPT-4o-mini as a structured judge to classify individual claims as grounded or hallucinated, returning a faithfulness score and a verdict of PASS / WARN / FAIL. It requires no embeddings, no vector DB, no infrastructure — just pip install hallucinationbench and two lines of code. Happy to discuss the judge prompt design if anyone is interested.

Discussion in the ATmosphere