External Publication

Train a fully open SmolLM4-750M model

Hugging Face Forums [Unofficial] May 11, 2026

Hi Hugging Face Team,

I wanted to suggest a possible new small language model for the HuggingFaceTB / SmolLM family: a fully open 750M-class model designed to sit between the smaller SmolLM2 models and the larger 1.7B / 3B models.

The proposed model name is:

SmolLM4-750M

The goal would be a compact, useful, public model that can run on modest hardware while still being strong for chat, coding help, math, summarization, and English/Spanish use.

Suggested high-level settings:

Size class: around 750M parameters
Context window: 16,384 tokens
Model type: causal language model
Main languages: English + Spanish
FineWeb-2 language subset: Spanish /spa_Latn
License target: Apache-2.0
Goal: fully open weights, data recipe, training details, and evaluation details

Suggested dataset stack:

HuggingFaceTB/smollm-corpus — core small-model pretraining mix
HuggingFaceFW/fineweb-edu — high-quality educational web data
HuggingFaceTB/finemath — math and problem-solving
HuggingFaceTB/stack-edu — educational code
HuggingFaceTB/smoltalk2 — main chat / post-training data
HuggingFaceTB/cosmopedia — synthetic textbooks, blogs, and stories
HuggingFaceFW/fineweb-2 , Spanish subset spa_Latn — multilingual expansion
open-thoughts/OpenThoughts-114k — compact reasoning traces
HuggingFaceTB/smol-smoltalk — final small-model instruction polish

Why I think this would be useful:

A 750M-class model would be small enough for local and low-resource users, but stronger than ultra-tiny models.
16K context would make it modern and useful without pushing it too far for its size.
English + Spanish support would make it more useful globally while keeping the language scope focused.
The dataset stack follows the SmolLM style: public web, educational data, math, code, synthetic educational text, multilingual data, reasoning data, and small-model-focused instruction tuning.
It would be great for students, hobbyists, researchers, local inference users, and people who want a practical open model below 1B parameters.

I’m leaving the exact internal architecture choices to the Hugging Face team, since you would know best how to design and train it properly.

Thanks for reading, Erik

Discussion in the ATmosphere