External Publication

Visualization of Imitation Learning by smolvla(SO101)

Hugging Face Forums [Unofficial] June 5, 2026

Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:

Short version

I think there are three different visualization/debugging questions mixed together here:

Question	Tool family	What it helps with
“Is my recorded dataset sane?”	`lerobot-dataset-viz`, LeRobot dataset visualizers, Rerun-style episode inspection	Camera streams, robot states, actions, episode structure
“How is training progressing?”	W&B, Trackio, TensorBoard, CSV/JSONL logs	Loss, learning rate, grad norm, eval metrics
“Will the policy actually work on the robot?”	Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks	Success rate, action correctness, camera/state/action mismatch

So I would not treat lerobot-dataset-viz as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for lerobot-train yet.

For SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.

1. First split: dataset visualization vs training metrics

Dataset/episode visualization

LeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:

camera frames
camera names/views
robot state streams
action streams
episode timing
whether the recorded behavior looks physically plausible

Relevant docs/pages:

LeRobot dataset tools
LeRobotDataset v3.0 docs
LeRobotDataset v3.0 blog post
LeRobot Dataset Visualizer repo

This kind of tool answers questions like:

Did I record the right cameras, states, actions, and episodes?

It does not directly answer:

Is my loss decreasing over training steps?

Those are different layers.

Training metric visualization

For training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, wandb.enable=true is described as optional and used for visualizing training plots:

Imitation Learning on Real-World Robots
SmolVLA docs

So for training metrics, I would think in terms of:

Option	Local?	Good for	Caveat
W&B	Not local-first by default	Mature experiment tracking, training plots, media, artifacts	Requires W&B setup/login unless using offline mode
W&B offline	Local logging first	Keeping W&B-style logs without immediate cloud sync	Still W&B-oriented; dashboard workflow may not be what you want
Trackio	Yes, local-first	Local scalar curves and lightweight dashboards	Promising, but not necessarily a full W&B replacement for LeRobot
TensorBoard	Yes	Classic local scalar curves	May require adding a writer if not already supported
CSV/JSONL logs	Yes	Simple, robust, reproducible	No rich dashboard unless you build/plot one

2. Trackio may be the HF-native option you were thinking of

If you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of Trackio :

Trackio docs
Trackio migration guide
Trackio blog post
Trackio GitHub repo

Trackio is very relevant here because it is:

Hugging Face-native
local-first
W&B-like
built around a Gradio dashboard
designed to log experiment metrics
able to sync/share through Hugging Face Spaces

The Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:

import trackio as wandb

wandb.init(project="my-project", name="my-run")
wandb.log({"train/loss": 0.123, "train/lr": 1e-4}, step=100)
wandb.finish()

That said, I would be careful with wording here.

I would say:

Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.

I would not say:

Trackio is a guaranteed drop-in replacement for LeRobot’s current --wandb.enable=true path.

Why not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain wandb.log() everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that lerobot-train already exposes something like:

lerobot-train \
  --trackio.enable=true

unless that has been added in the specific LeRobot version you are using.

A safer expectation is:

LeRobot logging feature	Trackio likelihood	Notes
Scalar metrics: loss, lr, grad norm	High	This is the easiest case
Eval metrics	High	If logged as scalars
Tables/images	Likely	Trackio has W&B-like media APIs, but exact behavior should be checked
Videos	Maybe	Needs checking for the exact current API and dashboard behavior
Checkpoint/artifact tracking	Be careful	W&B Artifacts and Trackio storage are not necessarily equivalent
Resume/run-id behavior	Be careful	W&B-specific run resume logic may not map 1:1
Full W&B feature parity	No	Trackio is lightweight, not a full W&B clone

So my practical recommendation would be:

Use the standard documented W&B path first if you are okay with W&B.
If you want local-first scalar curves, investigate Trackio.
If using lerobot-train, assume Trackio may need a small logger wrapper or code patch.
If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.

3. Why the loss curve is not enough in SmolVLA/SO101

This is the most important robotics-specific point.

In ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.

There are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:

Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
Clarifications on fine-tuning on different envs and embodiments
SO101/SmolVLA camera/setup discussion
SmolVLA poor performance / inference issue

The main lesson I would take from those is:

A clean loss curve does not guarantee a working rollout.

For SmolVLA/SO101, I would inspect at least these layers:

Layer	What to check	Why it matters
Camera setup	Number of cameras, camera names, view order, resolution	VLA policies are sensitive to visual input schema
State schema	Shape, order, meaning of `observation.state`	A converged loss can still learn the wrong mapping if state semantics differ
Action schema	Shape, order, joint vs end-effector meaning, gripper representation	Action mismatch can make rollout fail even if training looks fine
Dataset metadata	`meta/info.json`, feature names, fps, codebase version	Confirms what the dataset actually contains
Dataset statistics	`meta/stats.json`, normalization values	Wrong normalization can break policy behavior
Episode visualization	Camera/state/action streams	Helps detect recording/config mistakes
Evaluation	Open-loop eval, sim eval if available, real rollout	The final check is behavior, not just loss
Versioning	LeRobot version, model checkpoint, dataset format version	LeRobot/SmolVLA are moving quickly

The SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.

4. What I would try locally

If I wanted the simplest local path before going deeper, I would try this order.

Step 1: Confirm the dataset visually

Use the LeRobot dataset visualization path first.

Things to look for:

Are all expected camera views present?
Do the camera names match what the policy/config expects?
Are the wrist/front/top/side views in the expected places?
Does the robot state change smoothly?
Do actions look non-zero and physically meaningful?
Are gripper actions represented correctly?
Is fps consistent with what the training config expects?
Are there broken/missing videos or episodes?

Relevant links:

LeRobot dataset tools
LeRobotDataset v3.0 docs
LeRobot Dataset Visualizer

Step 2: Inspect metadata and stats

Open the dataset metadata files if available.

For LeRobotDataset v3, I would look at:

meta/info.json
meta/stats.json
meta/tasks.jsonl
meta/episodes.jsonl

In particular:

observation.state
action
observation.images.<camera_name>
fps
features
shape
dtype
codebase_version

This is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.

Step 3: Start with the official W&B path if possible

If you can use W&B, the official path is probably the least surprising first test:

lerobot-train \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=<your-dataset-repo-id> \
  --batch_size=<batch-size> \
  --steps=<num-steps> \
  --wandb.enable=true

The exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.

Step 4: If you want local-first curves, try Trackio separately

For a custom training script, Trackio may be very simple:

import trackio as wandb

wandb.init(
    project="smolvla-so101",
    name="local-test",
    config={
        "policy": "smolvla_base",
        "robot": "so101",
    },
)

wandb.log(
    {
        "train/loss": 0.123,
        "train/lr": 1e-4,
        "train/grad_norm": 0.5,
    },
    step=100,
)

wandb.finish()

For lerobot-train, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.

Step 5: If you want the most robust local fallback, log CSV/JSONL

A very boring but reliable fallback is:

{"step": 100, "train/loss": 0.123, "train/lr": 0.0001, "train/grad_norm": 0.5}
{"step": 200, "train/loss": 0.098, "train/lr": 0.0001, "train/grad_norm": 0.47}

Then plot it locally with Python.

This is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.

5. What to ask in LeRobot Discord

For SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”

Useful information to include:

Category	Include
LeRobot version	`pip show lerobot`, git commit, or install method
Command	Exact `lerobot-train` command
Policy	`lerobot/smolvla_base` or other checkpoint
Robot	SO101 / SO100 / other, follower/leader setup
Dataset	Hub repo id or local path
Dataset format	LeRobotDataset version if known
Cameras	Number, names, views, order
State/action	Shapes from metadata
Metadata	Relevant parts of `meta/info.json`
Stats	Relevant parts of `meta/stats.json`
Training curves	loss, lr, grad_norm, eval metrics if any
Visualization	screenshots or notes from `lerobot-dataset-viz`
Evaluation	open-loop eval, real rollout behavior, success/failure examples
Requirement	whether you need fully local/offline visualization

A good short Discord/forum report might look like:

I am fine-tuning SmolVLA on SO101 with LeRobot.

Goal:
- I want to visualize training curves locally if possible.
- I also want to confirm whether my dataset/camera/action setup is correct.

Setup:
- LeRobot version: <version-or-commit>
- Install method: <pip/source/docker/etc>
- Policy: <policy-path>
- Dataset: <dataset-repo-or-local-path>
- Robot: SO101
- Cameras: <camera-names-and-count>
- Training command: <exact-command>

What I checked:
- lerobot-dataset-viz: <works/does-not-work>
- meta/info.json: <relevant-shapes>
- meta/stats.json: <normalization-stats>
- W&B/Trackio/TensorBoard/logs: <what-you-tried>

Observed behavior:
- Training loss: <summary>
- Eval/rollout: <summary>
- Failure mode: <what-the-robot-does>

That gives the LeRobot community enough context to answer the robotics-specific part.

6. My current recommendation

If your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:

Rank	Option	Why
1	Parse local logs / CSV / JSONL	Most robust, fully local, no integration risk
2	Trackio	Best HF-native local/W&B-like dashboard candidate
3	W&B offline	Good if you already want W&B-style tracking
4	TensorBoard	Solid generic local ML tool
5	Full W&B online	Easiest if you accept W&B account/cloud workflow

But for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:

dataset episodes
camera names/order/count
meta/info.json
meta/stats.json
state/action shapes
normalization
open-loop evaluation
real rollout behavior

In other words:

Trackio may help you see the curve, but lerobot-dataset-viz and dataset metadata may help you understand whether the curve is meaningful.

7. Links worth checking

LeRobot / SmolVLA

LeRobot org page
LeRobot docs
LeRobot GitHub
Imitation Learning on Real-World Robots
SmolVLA docs
SmolVLA blog post
LeRobotDataset v3.0 docs
LeRobotDataset v3.0 blog post

Trackio

Trackio docs
Trackio migration guide
Trackio blog post
Trackio GitHub repo

Related GitHub issues

Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
Clarifications on fine-tuning on different envs and embodiments
SO101 / SmolVLA camera setup discussion
SmolVLA / SO101 pretrained setup discussion
SmolVLA inference problem

Practical physical AI examples

Training ACT on SO-101
Fine-tuning NVIDIA GR00T N1.5 for SO-101
NVIDIA Isaac GR00T in LeRobot
Generalist robot policy evaluation with Isaac Lab Arena and LeRobot
Physical AI terminology overview