External Publication
Visit Post

PTQ INT8 via TFLiteConverter — encoder-decoder seq2seq model loses encoder context entirely after conversion

Hugging Face Forums [Unofficial] April 27, 2026
Source

I’m trying to deploy a seq2seq encoder-decoder model on an embedded target that only accepts INT8 TFLite models. The conversion via TFLiteConverter completes without errors, but the resulting model is completely broken at inference — suggesting the converter is not handling the encoder-decoder architecture correctly under full INT8 quantization.**

Environment

  • tensorflow 2.13, transformers 4.40
  • macOS (conversion) → embedded Linux with INT8 hardware delegate (inference)

Problem

Converting a fused encoder-decoder seq2seq model to INT8 using TFLiteConverter with the following setup:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type  = tf.int8
converter.inference_output_type = tf.int8

Conversion completes without errors, but the model generates repeated tokens for any input (BLEU drops from 23.9 to 0.04). The decoder stops using encoder context entirely from the first inference step.

EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8 is not viable — TILE op unsupported at runtime.

Question

Is this a known limitation of TFLiteConverter PTQ for encoder-decoder architectures? Is there a recommended calibration strategy or converter configuration for fused encoder-decoder graphs with cross-attention?

Open to any working approach to move forward.

Reproducible notebook available on request.

Discussion in the ATmosphere

Loading comments...