External Publication
Visit Post

Best models for English → Japanese and English → Chinese translation?

Hugging Face Forums [Unofficial] April 13, 2026
Source

The current version of TranslateGemma is probably based on Gemma 3. If a version of TranslateGemma based on Gemma 4 comes out eventually, it would likely be a solid option…


Here is a decision matrix for English→Japanese and English→Chinese translation.

The core judgment is this: use separate routes if translation quality is a product feature, and use one multilingual model if simplicity is the product feature. The evidence behind that split is straightforward. WMT24 still treats MT as a serious evaluated task with human-judged rankings , and Japanese now has a dedicated benchmark, JP-TL-Bench , built specifically for subtle JA↔EN quality differences that broad benchmarks often blur. (A statistical neural machine translation.)

Executive decision matrix

Your real constraint Use this architecture First model(s) to try Why this is the right default Main caveat
Best overall quality for your exact pair set Separate routes EN→JA: CAT-Translate-7B, PLaMo Translate. EN→ZH: LMT-60-8B. Keep MADLAD-400-10B-MT as fallback. Japanese benefits more from specialization, while LMT-60 is explicitly Chinese-English-centric and multilingual. CAT-Translate and PLaMo are both translation specialists for JA↔EN. (Hugging Face) More systems to operate
One model only, best current quality Single multilingual model TranslateGemma-12B first, then LMT-60-8B , then MADLAD-400-10B-MT TranslateGemma is a current Google translation family across 55 languages in 4B/12B/27B ; LMT-60 is Chinese-English-centric; MADLAD remains a strong broad multilingual baseline. (blog.google) TranslateGemma is gated on Hugging Face and less flexible for prompt-based control
Easiest commercial / licensing path Separate routes or single model with Apache/MIT only CAT-Translate-7B + LMT-60-8B , or LMT-60-8B alone, with MADLAD-400-10B-MT as baseline CAT is MIT , LMT-60 is Apache-2.0 , MADLAD is based on an Apache-2.0 HF conversion. (Hugging Face) You give up HY-MT1.5 unless legal review clears it
Best glossary / terminology / formatting control Separate routes , but benchmark one constrained multilingual model HY-MT1.5-7B as a benchmark candidate HY-MT1.5 explicitly supports terminology intervention , contextual translation , and formatted translation. (Hugging Face) Tencent’s license excludes EU, UK, and South Korea territory coverage. (Hugging Face)
Chinese is more important than Japanese Single multilingual model or separate routes with Chinese-leading main path LMT-60-8B LMT-60 is explicitly described as Chinese-English-centric , trained on 90B tokens , covering 60 languages / 234 directions. (Hugging Face) JA quality may still trail a stronger JA specialist
Japanese is more important than Chinese Separate routes CAT-Translate-7B , PLaMo Translate , then TranslateGemma-12B as a benchmark Japanese quality often needs finer-grained evaluation; JP-TL-Bench exists for exactly that reason. CAT and PLaMo are both JA↔EN translation specialists. (GitHub) PLaMo commercial use requires separate license/contact flow
Lowest ops burden with decent quality Single multilingual model LMT-60-8B or MADLAD-400-10B-MT Both are straightforward HF options with permissive licensing; MADLAD is a broad multilingual MT baseline, and LMT-60 is more aligned to your Chinese requirement. (Hugging Face) Not the absolute best likely JA route
Edge / smaller deployment Separate routes or compact single-model tests CAT-Translate smaller variants , HY-MT1.5-1.8B , TranslateGemma-4B CAT has 0.8B/1.4B/3.3B/7B variants, HY has 1.8B aimed at edge / real-time scenarios, and TranslateGemma has a 4B option. (Hugging Face) Small models will need stronger task-specific evaluation

The hard rules

Rule 1

If English→Japanese quality is strategically important , do not rely on one neutral multilingual model alone. Start with a Japanese-specialist route. JP-TL-Bench exists because JA↔EN differences are often subtle enough that ordinary broad metrics are not enough. (GitHub)

Rule 2

If English→Chinese is the main business path , start from LMT-60-8B before you start from MADLAD or NLLB. LMT-60 is explicitly centered on the Chinese-English axis, which is unusually well aligned to your requirement. (Hugging Face)

Rule 3

If you want one model only , use this priority order:

  1. TranslateGemma-12B if you want the strongest current one-model benchmark candidate and can accept Hugging Face gating. (blog.google)
  2. LMT-60-8B if Chinese matters more or if you want Apache-2.0 simplicity. (Hugging Face)
  3. MADLAD-400-10B-MT if you want the most established broad multilingual MT baseline. (Hugging Face)

Rule 4

Do not make NLLB-200 your default production choice in 2026 unless it wins on your own evals despite its limitations. Meta’s own card says it is a research model , not released for production deployment , not intended for document translation , and trained on inputs not exceeding 512 tokens. (Hugging Face)

Rule 5

If you need strict terminology, context carryover, or format preservation , benchmark HY-MT1.5 even if it is not your final deployment choice. It is one of the few current translation models whose card explicitly foregrounds those features. But treat the license as a hard legal gate, not a soft warning. (Hugging Face)

Model-by-model decision table

Model Best use in your case Choose it when Do not choose it when
CAT-Translate-7B EN→JA primary route You want a JA↔EN specialist, smaller family sizes, and MIT licensing. The collection has multiple sizes and is explicitly for Japanese/English bidirectional translation. (Hugging Face) You need one model to cover Chinese well too
PLaMo Translate EN→JA benchmark candidate You want a serious JA translation specialist and are willing to handle the commercial licensing/contact path. (Hugging Face) You need frictionless commercial deployment
LMT-60-8B EN→ZH primary route or one-model default Chinese matters a lot, and you want Apache-2.0 plus multilingual coverage. (Hugging Face) You need the strongest likely JA-specialized route
TranslateGemma-12B Best current one-model benchmark You want a current Google translation family with strong reported efficiency/quality tradeoff across 55 languages. (blog.google) You need open ungated downloads or prompt-level style/glossary control on HF
MADLAD-400-10B-MT Broad multilingual baseline / fallback You want a very broad multilingual MT baseline with Apache-style simplicity. (Hugging Face) You expect it to be the strongest JA-specific answer
HY-MT1.5-7B High-upside benchmark for controlled translation You need terminology, contextual translation, or format-preserving translation, and your legal territory is compatible. (Hugging Face) You ship in EU/UK/KR, or you want a clean permissive-license default
NLLB-200 Research baseline only You want a historically important baseline in the bake-off. (Hugging Face) You need a default production path

The matrix I would use for an actual go / no-go decision

Option A. Quality-first architecture

Choose this if translation is a product feature, customer-facing, or high-stakes.

Route Primary candidate Secondary candidate Fallback
EN→JA CAT-Translate-7B PLaMo Translate TranslateGemma-12B
EN→ZH LMT-60-8B TranslateGemma-12B MADLAD-400-10B-MT
Cross-language fallback MADLAD-400-10B-MT

Why: this layout gives you specialization where it matters most, but still keeps one broad multilingual backbone in the system. That is the cleanest balance between quality and maintainability for your pair set. The reason I put CAT and PLaMo in the JA route is not that they are universally proven winners on all public leaderboards. It is that they are purpose-built JA↔EN translation models , while the public benchmark ecosystem for Japanese itself already shows that subtle JA quality differences deserve specialized evaluation. (Hugging Face)

Option B. One-model architecture

Choose this if you want the smallest operational surface area.

Priority Model
1 TranslateGemma-12B
2 LMT-60-8B
3 MADLAD-400-10B-MT

Why this order: TranslateGemma is the strongest current single-family translation release among the models we reviewed, LMT-60 is the best Chinese-leaning one-model choice, and MADLAD is the safest broad baseline. (blog.google)

Option C. License-first architecture

Choose this if legal simplicity matters more than shaving the last quality points.

Allowed by default Avoid unless legal review passes
CAT-Translate , LMT-60 , MADLAD-400 HY-MT1.5 , PLaMo Translate , TranslateGemma gated access

Why: CAT is MIT, LMT-60 is Apache-2.0, and MADLAD’s HF conversion is Apache-2.0-based. HY has explicit territory exclusions, PLaMo requires commercial contact flow, and TranslateGemma requires accepting Google’s usage license on Hugging Face. (Hugging Face)

My strict recommendation for you

If I had to lock this down into a single decision rule set:

Use separate routes if any of these are true

  • Japanese output quality matters beyond “good enough.”
  • You expect style, register, or terminology complaints from users.
  • Translation is central to the product, not just a supporting feature. These conditions point toward CAT-Translate-7B for EN→JA and LMT-60-8B for EN→ZH , with MADLAD-400-10B-MT as a multilingual fallback. (Hugging Face)

Use one multilingual model if all of these are true

  • You want one deployable stack.
  • Your content is mostly standard prose or product/support text.
  • You accept that EN→JA may not be maximally specialized. These conditions point toward TranslateGemma-12B first , then LMT-60-8B , then MADLAD-400-10B-MT. (blog.google)

Do not choose HY-MT1.5 as your sole plan unless all of these are true

  • You are outside EU/UK/KR deployment scope, or have separate permission.
  • You specifically need glossary/context/format constraints.
  • You are comfortable with its custom license. (Hugging Face)

Do not choose NLLB as your default unless your own evaluation forces you to

Its own model card is too explicit about research-only framing, document-translation unsuitability, and 512-token training length to make it the default recommendation here. (Hugging Face)

The evaluation matrix that should sit next to the model matrix

Use this regardless of architecture:

Route Benchmark / tool Why
EN→JA JP-TL-Bench Built for subtle JA↔EN pairwise differences, with win rate and Bradley–Terry scoring. (GitHub)
EN→ZH NTREX-128 English-source benchmark into 128 target languages with document-level information. (GitHub)
Broad multilingual regression FLORES-200 Standard multilingual sanity-check set with 842 articles and 3001 sentences. (Hugging Face)
Metric selection WMT24 Metrics / COMET / MetricX WMT24 Metrics evaluates automatic metrics by correlation with human judgments; COMET predicts human MT judgments; MetricX has current open implementations. (A statistical neural machine translation.)
Side-by-side diagnosis MT-Telescope Useful for understanding why one system beats another, not just whether it does. (Unbabel)

Final matrix in one sentence

For your exact case:

  • Quality-first: EN→JA = CAT-Translate-7B , EN→ZH = LMT-60-8B , fallback = MADLAD-400-10B-MT. (Hugging Face)
  • One-model-first: TranslateGemma-12B , then LMT-60-8B , then MADLAD-400-10B-MT. (blog.google)
  • High-upside extra benchmark: HY-MT1.5-7B , but only if the license is actually compatible with your deployment geography. (Hugging Face)

Discussion in the ATmosphere

Loading comments...