{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibd2bis752e5gpriabp22xfblfruu5lzdwsk3kgregflvcjsxs5ba",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mki7zqkp5vu2"
  },
  "path": "/t/ptq-int8-via-tfliteconverter-encoder-decoder-seq2seq-model-loses-encoder-context-entirely-after-conversion/175595#post_1",
  "publishedAt": "2026-04-27T13:01:57.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I’m trying to deploy a seq2seq encoder-decoder model on an embedded target that only accepts INT8 TFLite models. The conversion via `TFLiteConverter` completes without errors, but the resulting model is completely broken at inference — suggesting the converter is not handling the encoder-decoder architecture correctly under full INT8 quantization.**\n\n**Environment**\n\n  * `tensorflow 2.13`, `transformers 4.40`\n  * macOS (conversion) → embedded Linux with INT8 hardware delegate (inference)\n\n\n\n### Problem\n\nConverting a fused encoder-decoder seq2seq model to INT8 using `TFLiteConverter` with the following setup:\n\n\n    converter.optimizations = [tf.lite.Optimize.DEFAULT]\n    converter.representative_dataset = representative_dataset_gen\n    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\n    converter.inference_input_type  = tf.int8\n    converter.inference_output_type = tf.int8\n\n\nConversion completes without errors, but the model generates repeated tokens for any input (BLEU drops from 23.9 to 0.04). The decoder stops using encoder context entirely from the first inference step.\n\n`EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8` is not viable — `TILE` op unsupported at runtime.\n\n### Question\n\nIs this a known limitation of `TFLiteConverter` PTQ for encoder-decoder architectures? Is there a recommended calibration strategy or converter configuration for fused encoder-decoder graphs with cross-attention?\n\nOpen to any working approach to move forward.\n\nReproducible notebook available on request.",
  "title": "PTQ INT8 via TFLiteConverter — encoder-decoder seq2seq model loses encoder context entirely after conversion"
}