External Publication

Looking for guidance. Trying to create a model with TrOCR's encoder + Google's mT5 multilingual decoder but model fails to overfit on a single data sample

Hugging Face Forums [Unofficial] March 26, 2026

Hi everyone,

I am working on building a proof of concept for OCR system that can recognize both handwritten and printed Hindi (Devanagari) text in complex documents. I’m trying to build on top of TrOCR (microsoft/trocr-base-handwritten) since it already has a strong vision encoder trained for handwriting recognition.

The core problem I’m running into is on the decoder/tokenizer side — TrOCR’s default decoder and tokenizer are trained for English only, and I need Hindi output.

What I’ve tried so far:

I replaced TrOCR’s decoder with google/mt5-small, which natively supports Hindi tokenization. The hidden sizes matched, so I expected this to work.

However, the model failed to overfit even on a single data point. The loss comes down but hovers at near 2-3 at the end, and the characters keep repeating instead of forming a meaningful word or the sentence. I have tried changing learning rate, introducing repetition penalty but overfitting just don’t happen.

My code is over here. Check cell 1 to cell 17:

colab.research.google.com

Google Colab

I need guidance as is their any other tokenizer out there that can work well with TrOCR’s encoder or can you help me improve in this current setup (TrOCR’s encoder+Decoder).

Google Colab

Discussion in the ATmosphere