{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiar6oi6xaansuya5msq67qt5jvhkipa4qitwg5cppxrcuvianlyxq",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjexmtsscsp2"
},
"path": "/t/best-models-for-english-japanese-and-english-chinese-translation/175201#post_2",
"publishedAt": "2026-04-13T12:49:31.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"Gemma 4",
"A statistical neural machine translation.",
"Hugging Face",
"blog.google",
"GitHub",
"Unbabel"
],
"textContent": "The current version of TranslateGemma is probably based on Gemma 3. **If** a version of TranslateGemma based on Gemma 4 comes out eventually, it would likely be a solid option…\n\n* * *\n\nHere is a decision matrix for **English→Japanese** and **English→Chinese** translation.\n\nThe core judgment is this: **use separate routes if translation quality is a product feature, and use one multilingual model if simplicity is the product feature.** The evidence behind that split is straightforward. WMT24 still treats MT as a serious evaluated task with **human-judged rankings** , and Japanese now has a dedicated benchmark, **JP-TL-Bench** , built specifically for subtle JA↔EN quality differences that broad benchmarks often blur. (A statistical neural machine translation.)\n\n## Executive decision matrix\n\nYour real constraint | Use this architecture | First model(s) to try | Why this is the right default | Main caveat\n---|---|---|---|---\nBest overall quality for your exact pair set | **Separate routes** | **EN→JA:** CAT-Translate-7B, PLaMo Translate. **EN→ZH:** LMT-60-8B. Keep MADLAD-400-10B-MT as fallback. | Japanese benefits more from specialization, while LMT-60 is explicitly **Chinese-English-centric** and multilingual. CAT-Translate and PLaMo are both translation specialists for JA↔EN. (Hugging Face) | More systems to operate\nOne model only, best current quality | **Single multilingual model** | **TranslateGemma-12B** first, then **LMT-60-8B** , then **MADLAD-400-10B-MT** | TranslateGemma is a current Google translation family across **55 languages** in **4B/12B/27B** ; LMT-60 is Chinese-English-centric; MADLAD remains a strong broad multilingual baseline. (blog.google) | TranslateGemma is gated on Hugging Face and less flexible for prompt-based control\nEasiest commercial / licensing path | **Separate routes** or **single model with Apache/MIT only** | **CAT-Translate-7B + LMT-60-8B** , or **LMT-60-8B** alone, with **MADLAD-400-10B-MT** as baseline | CAT is **MIT** , LMT-60 is **Apache-2.0** , MADLAD is based on an **Apache-2.0** HF conversion. (Hugging Face) | You give up HY-MT1.5 unless legal review clears it\nBest glossary / terminology / formatting control | **Separate routes** , but benchmark one constrained multilingual model | **HY-MT1.5-7B** as a benchmark candidate | HY-MT1.5 explicitly supports **terminology intervention** , **contextual translation** , and **formatted translation**. (Hugging Face) | Tencent’s license excludes **EU, UK, and South Korea** territory coverage. (Hugging Face)\nChinese is more important than Japanese | **Single multilingual model** or **separate routes** with Chinese-leading main path | **LMT-60-8B** | LMT-60 is explicitly described as **Chinese-English-centric** , trained on **90B tokens** , covering **60 languages / 234 directions**. (Hugging Face) | JA quality may still trail a stronger JA specialist\nJapanese is more important than Chinese | **Separate routes** | **CAT-Translate-7B** , **PLaMo Translate** , then **TranslateGemma-12B** as a benchmark | Japanese quality often needs finer-grained evaluation; JP-TL-Bench exists for exactly that reason. CAT and PLaMo are both JA↔EN translation specialists. (GitHub) | PLaMo commercial use requires separate license/contact flow\nLowest ops burden with decent quality | **Single multilingual model** | **LMT-60-8B** or **MADLAD-400-10B-MT** | Both are straightforward HF options with permissive licensing; MADLAD is a broad multilingual MT baseline, and LMT-60 is more aligned to your Chinese requirement. (Hugging Face) | Not the absolute best likely JA route\nEdge / smaller deployment | **Separate routes or compact single-model tests** | **CAT-Translate smaller variants** , **HY-MT1.5-1.8B** , **TranslateGemma-4B** | CAT has **0.8B/1.4B/3.3B/7B** variants, HY has **1.8B** aimed at edge / real-time scenarios, and TranslateGemma has a **4B** option. (Hugging Face) | Small models will need stronger task-specific evaluation\n\n## The hard rules\n\n### Rule 1\n\nIf **English→Japanese quality is strategically important** , do **not** rely on one neutral multilingual model alone. Start with a **Japanese-specialist route**. JP-TL-Bench exists because JA↔EN differences are often subtle enough that ordinary broad metrics are not enough. (GitHub)\n\n### Rule 2\n\nIf **English→Chinese is the main business path** , start from **LMT-60-8B** before you start from MADLAD or NLLB. LMT-60 is explicitly centered on the Chinese-English axis, which is unusually well aligned to your requirement. (Hugging Face)\n\n### Rule 3\n\nIf you want **one model only** , use this priority order:\n\n 1. **TranslateGemma-12B** if you want the strongest current one-model benchmark candidate and can accept Hugging Face gating. (blog.google)\n 2. **LMT-60-8B** if Chinese matters more or if you want Apache-2.0 simplicity. (Hugging Face)\n 3. **MADLAD-400-10B-MT** if you want the most established broad multilingual MT baseline. (Hugging Face)\n\n\n\n### Rule 4\n\nDo **not** make **NLLB-200** your default production choice in 2026 unless it wins on your own evals despite its limitations. Meta’s own card says it is a **research model** , **not released for production deployment** , **not intended for document translation** , and trained on inputs not exceeding **512 tokens**. (Hugging Face)\n\n### Rule 5\n\nIf you need **strict terminology, context carryover, or format preservation** , benchmark **HY-MT1.5** even if it is not your final deployment choice. It is one of the few current translation models whose card explicitly foregrounds those features. But treat the license as a hard legal gate, not a soft warning. (Hugging Face)\n\n## Model-by-model decision table\n\nModel | Best use in your case | Choose it when | Do not choose it when\n---|---|---|---\n**CAT-Translate-7B** | EN→JA primary route | You want a JA↔EN specialist, smaller family sizes, and MIT licensing. The collection has multiple sizes and is explicitly for Japanese/English bidirectional translation. (Hugging Face) | You need one model to cover Chinese well too\n**PLaMo Translate** | EN→JA benchmark candidate | You want a serious JA translation specialist and are willing to handle the commercial licensing/contact path. (Hugging Face) | You need frictionless commercial deployment\n**LMT-60-8B** | EN→ZH primary route or one-model default | Chinese matters a lot, and you want Apache-2.0 plus multilingual coverage. (Hugging Face) | You need the strongest likely JA-specialized route\n**TranslateGemma-12B** | Best current one-model benchmark | You want a current Google translation family with strong reported efficiency/quality tradeoff across 55 languages. (blog.google) | You need open ungated downloads or prompt-level style/glossary control on HF\n**MADLAD-400-10B-MT** | Broad multilingual baseline / fallback | You want a very broad multilingual MT baseline with Apache-style simplicity. (Hugging Face) | You expect it to be the strongest JA-specific answer\n**HY-MT1.5-7B** | High-upside benchmark for controlled translation | You need terminology, contextual translation, or format-preserving translation, and your legal territory is compatible. (Hugging Face) | You ship in EU/UK/KR, or you want a clean permissive-license default\n**NLLB-200** | Research baseline only | You want a historically important baseline in the bake-off. (Hugging Face) | You need a default production path\n\n## The matrix I would use for an actual go / no-go decision\n\n### Option A. Quality-first architecture\n\nChoose this if translation is a product feature, customer-facing, or high-stakes.\n\nRoute | Primary candidate | Secondary candidate | Fallback\n---|---|---|---\nEN→JA | **CAT-Translate-7B** | **PLaMo Translate** | **TranslateGemma-12B**\nEN→ZH | **LMT-60-8B** | **TranslateGemma-12B** | **MADLAD-400-10B-MT**\nCross-language fallback | **MADLAD-400-10B-MT** | — | —\n\nWhy: this layout gives you specialization where it matters most, but still keeps one broad multilingual backbone in the system. That is the cleanest balance between quality and maintainability for your pair set. The reason I put CAT and PLaMo in the JA route is not that they are universally proven winners on all public leaderboards. It is that they are **purpose-built JA↔EN translation models** , while the public benchmark ecosystem for Japanese itself already shows that subtle JA quality differences deserve specialized evaluation. (Hugging Face)\n\n### Option B. One-model architecture\n\nChoose this if you want the smallest operational surface area.\n\nPriority | Model\n---|---\n1 | **TranslateGemma-12B**\n2 | **LMT-60-8B**\n3 | **MADLAD-400-10B-MT**\n\nWhy this order: TranslateGemma is the strongest current single-family translation release among the models we reviewed, LMT-60 is the best Chinese-leaning one-model choice, and MADLAD is the safest broad baseline. (blog.google)\n\n### Option C. License-first architecture\n\nChoose this if legal simplicity matters more than shaving the last quality points.\n\nAllowed by default | Avoid unless legal review passes\n---|---\n**CAT-Translate** , **LMT-60** , **MADLAD-400** | **HY-MT1.5** , **PLaMo Translate** , **TranslateGemma gated access**\n\nWhy: CAT is MIT, LMT-60 is Apache-2.0, and MADLAD’s HF conversion is Apache-2.0-based. HY has explicit territory exclusions, PLaMo requires commercial contact flow, and TranslateGemma requires accepting Google’s usage license on Hugging Face. (Hugging Face)\n\n## My strict recommendation for you\n\nIf I had to lock this down into a single decision rule set:\n\n### Use **separate routes** if any of these are true\n\n * Japanese output quality matters beyond “good enough.”\n * You expect style, register, or terminology complaints from users.\n * Translation is central to the product, not just a supporting feature.\nThese conditions point toward **CAT-Translate-7B for EN→JA** and **LMT-60-8B for EN→ZH** , with **MADLAD-400-10B-MT** as a multilingual fallback. (Hugging Face)\n\n\n\n### Use **one multilingual model** if all of these are true\n\n * You want one deployable stack.\n * Your content is mostly standard prose or product/support text.\n * You accept that EN→JA may not be maximally specialized.\nThese conditions point toward **TranslateGemma-12B first** , then **LMT-60-8B** , then **MADLAD-400-10B-MT**. (blog.google)\n\n\n\n### Do **not** choose HY-MT1.5 as your sole plan unless all of these are true\n\n * You are outside EU/UK/KR deployment scope, or have separate permission.\n * You specifically need glossary/context/format constraints.\n * You are comfortable with its custom license. (Hugging Face)\n\n\n\n### Do **not** choose NLLB as your default unless your own evaluation forces you to\n\nIts own model card is too explicit about research-only framing, document-translation unsuitability, and 512-token training length to make it the default recommendation here. (Hugging Face)\n\n## The evaluation matrix that should sit next to the model matrix\n\nUse this regardless of architecture:\n\nRoute | Benchmark / tool | Why\n---|---|---\nEN→JA | **JP-TL-Bench** | Built for subtle JA↔EN pairwise differences, with win rate and Bradley–Terry scoring. (GitHub)\nEN→ZH | **NTREX-128** | English-source benchmark into 128 target languages with document-level information. (GitHub)\nBroad multilingual regression | **FLORES-200** | Standard multilingual sanity-check set with 842 articles and 3001 sentences. (Hugging Face)\nMetric selection | **WMT24 Metrics / COMET / MetricX** | WMT24 Metrics evaluates automatic metrics by correlation with human judgments; COMET predicts human MT judgments; MetricX has current open implementations. (A statistical neural machine translation.)\nSide-by-side diagnosis | **MT-Telescope** | Useful for understanding why one system beats another, not just whether it does. (Unbabel)\n\n## Final matrix in one sentence\n\nFor your exact case:\n\n * **Quality-first:** EN→JA = **CAT-Translate-7B** , EN→ZH = **LMT-60-8B** , fallback = **MADLAD-400-10B-MT**. (Hugging Face)\n * **One-model-first:** **TranslateGemma-12B** , then **LMT-60-8B** , then **MADLAD-400-10B-MT**. (blog.google)\n * **High-upside extra benchmark:** **HY-MT1.5-7B** , but only if the license is actually compatible with your deployment geography. (Hugging Face)\n\n",
"title": "Best models for English → Japanese and English → Chinese translation?"
}