{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifq43n42lvup4f3fioutb5wpmgskc2hsvsdslhgtay42je53amceu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mllhq5cz57e2"
  },
  "path": "/t/ptq-int8-via-tfliteconverter-encoder-decoder-seq2seq-model-loses-encoder-context-entirely-after-conversion/175595#post_3",
  "publishedAt": "2026-05-11T13:06:43.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "MarianMT Helsinki-NLP/opus-mt-en-fr",
    "t5-small"
  ],
  "textContent": "Hi,\n\nThanks a lot for your initial response, it pointed me in the right direction.\n\nQuick update on what I’ve found after several weeks of testing:\n\n**Confirmed:** PTQ INT8 via `TFLiteConverter` is indeed broken on the decoder side of seq2seq Transformer architectures. I reproduced the issue on two separate models (MarianMT Helsinki-NLP/opus-mt-en-fr and t5-small), with the same symptom: the encoder converts cleanly to INT8, but the decoder produces garbage outputs (random tokens, empty strings, or nonsensical translations). FP32 works perfectly on both.\n\nThe root cause appears to be miscalibrated quantization scales on the cross-attention layers, the representative dataset only sees encoder inputs, so the decoder’s activations are never properly calibrated.\n\nI’m now exploring QAT as a potential fix, but I’m hitting a wall on the TFLite side specifically, most documented success stories with `optimum` + ONNX Runtime work on CPU, but the TFLite export path for seq2seq remains largely undocumented.\n\nIf anyone has successfully deployed a quantized seq2seq Transformer to TFLite (not ONNX Runtime), especially on a custom hardware delegate, I’d love to hear about it.\n\nThanks again.",
  "title": "PTQ INT8 via TFLiteConverter — encoder-decoder seq2seq model loses encoder context entirely after conversion"
}