External Publication
Visit Post

Visualization of Imitation Learning by smolvla(SO101)

Hugging Face Forums [Unofficial] June 5, 2026
Source

Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:


Short version

I think there are three different visualization/debugging questions mixed together here:

Question Tool family What it helps with
“Is my recorded dataset sane?” lerobot-dataset-viz, LeRobot dataset visualizers, Rerun-style episode inspection Camera streams, robot states, actions, episode structure
“How is training progressing?” W&B, Trackio, TensorBoard, CSV/JSONL logs Loss, learning rate, grad norm, eval metrics
“Will the policy actually work on the robot?” Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks Success rate, action correctness, camera/state/action mismatch

So I would not treat lerobot-dataset-viz as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for lerobot-train yet.

For SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.


1. First split: dataset visualization vs training metrics

Dataset/episode visualization

LeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:

  • camera frames
  • camera names/views
  • robot state streams
  • action streams
  • episode timing
  • whether the recorded behavior looks physically plausible

Relevant docs/pages:

  • LeRobot dataset tools
  • LeRobotDataset v3.0 docs
  • LeRobotDataset v3.0 blog post
  • LeRobot Dataset Visualizer repo

This kind of tool answers questions like:

Did I record the right cameras, states, actions, and episodes?

It does not directly answer:

Is my loss decreasing over training steps?

Those are different layers.

Training metric visualization

For training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, wandb.enable=true is described as optional and used for visualizing training plots:

  • Imitation Learning on Real-World Robots
  • SmolVLA docs

So for training metrics, I would think in terms of:

Option Local? Good for Caveat
W&B Not local-first by default Mature experiment tracking, training plots, media, artifacts Requires W&B setup/login unless using offline mode
W&B offline Local logging first Keeping W&B-style logs without immediate cloud sync Still W&B-oriented; dashboard workflow may not be what you want
Trackio Yes, local-first Local scalar curves and lightweight dashboards Promising, but not necessarily a full W&B replacement for LeRobot
TensorBoard Yes Classic local scalar curves May require adding a writer if not already supported
CSV/JSONL logs Yes Simple, robust, reproducible No rich dashboard unless you build/plot one

2. Trackio may be the HF-native option you were thinking of

If you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of Trackio :

  • Trackio docs
  • Trackio migration guide
  • Trackio blog post
  • Trackio GitHub repo

Trackio is very relevant here because it is:

  • Hugging Face-native
  • local-first
  • W&B-like
  • built around a Gradio dashboard
  • designed to log experiment metrics
  • able to sync/share through Hugging Face Spaces

The Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:

import trackio as wandb

wandb.init(project="my-project", name="my-run")
wandb.log({"train/loss": 0.123, "train/lr": 1e-4}, step=100)
wandb.finish()

That said, I would be careful with wording here.

I would say:

Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.

I would not say:

Trackio is a guaranteed drop-in replacement for LeRobot’s current --wandb.enable=true path.

Why not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain wandb.log() everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that lerobot-train already exposes something like:

lerobot-train \
  --trackio.enable=true

unless that has been added in the specific LeRobot version you are using.

A safer expectation is:

LeRobot logging feature Trackio likelihood Notes
Scalar metrics: loss, lr, grad norm High This is the easiest case
Eval metrics High If logged as scalars
Tables/images Likely Trackio has W&B-like media APIs, but exact behavior should be checked
Videos Maybe Needs checking for the exact current API and dashboard behavior
Checkpoint/artifact tracking Be careful W&B Artifacts and Trackio storage are not necessarily equivalent
Resume/run-id behavior Be careful W&B-specific run resume logic may not map 1:1
Full W&B feature parity No Trackio is lightweight, not a full W&B clone

So my practical recommendation would be:

  1. Use the standard documented W&B path first if you are okay with W&B.
  2. If you want local-first scalar curves, investigate Trackio.
  3. If using lerobot-train, assume Trackio may need a small logger wrapper or code patch.
  4. If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.

3. Why the loss curve is not enough in SmolVLA/SO101

This is the most important robotics-specific point.

In ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.

There are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:

  • Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
  • Clarifications on fine-tuning on different envs and embodiments
  • SO101/SmolVLA camera/setup discussion
  • SmolVLA poor performance / inference issue

The main lesson I would take from those is:

A clean loss curve does not guarantee a working rollout.

For SmolVLA/SO101, I would inspect at least these layers:

Layer What to check Why it matters
Camera setup Number of cameras, camera names, view order, resolution VLA policies are sensitive to visual input schema
State schema Shape, order, meaning of observation.state A converged loss can still learn the wrong mapping if state semantics differ
Action schema Shape, order, joint vs end-effector meaning, gripper representation Action mismatch can make rollout fail even if training looks fine
Dataset metadata meta/info.json, feature names, fps, codebase version Confirms what the dataset actually contains
Dataset statistics meta/stats.json, normalization values Wrong normalization can break policy behavior
Episode visualization Camera/state/action streams Helps detect recording/config mistakes
Evaluation Open-loop eval, sim eval if available, real rollout The final check is behavior, not just loss
Versioning LeRobot version, model checkpoint, dataset format version LeRobot/SmolVLA are moving quickly

The SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.


4. What I would try locally

If I wanted the simplest local path before going deeper, I would try this order.

Step 1: Confirm the dataset visually

Use the LeRobot dataset visualization path first.

Things to look for:

  • Are all expected camera views present?
  • Do the camera names match what the policy/config expects?
  • Are the wrist/front/top/side views in the expected places?
  • Does the robot state change smoothly?
  • Do actions look non-zero and physically meaningful?
  • Are gripper actions represented correctly?
  • Is fps consistent with what the training config expects?
  • Are there broken/missing videos or episodes?

Relevant links:

  • LeRobot dataset tools
  • LeRobotDataset v3.0 docs
  • LeRobot Dataset Visualizer

Step 2: Inspect metadata and stats

Open the dataset metadata files if available.

For LeRobotDataset v3, I would look at:

meta/info.json
meta/stats.json
meta/tasks.jsonl
meta/episodes.jsonl

In particular:

observation.state
action
observation.images.<camera_name>
fps
features
shape
dtype
codebase_version

This is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.

Step 3: Start with the official W&B path if possible

If you can use W&B, the official path is probably the least surprising first test:

lerobot-train \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=<your-dataset-repo-id> \
  --batch_size=<batch-size> \
  --steps=<num-steps> \
  --wandb.enable=true

The exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.

Step 4: If you want local-first curves, try Trackio separately

For a custom training script, Trackio may be very simple:

import trackio as wandb

wandb.init(
    project="smolvla-so101",
    name="local-test",
    config={
        "policy": "smolvla_base",
        "robot": "so101",
    },
)

wandb.log(
    {
        "train/loss": 0.123,
        "train/lr": 1e-4,
        "train/grad_norm": 0.5,
    },
    step=100,
)

wandb.finish()

For lerobot-train, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.

Step 5: If you want the most robust local fallback, log CSV/JSONL

A very boring but reliable fallback is:

{"step": 100, "train/loss": 0.123, "train/lr": 0.0001, "train/grad_norm": 0.5}
{"step": 200, "train/loss": 0.098, "train/lr": 0.0001, "train/grad_norm": 0.47}

Then plot it locally with Python.

This is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.


5. What to ask in LeRobot Discord

For SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”

Useful information to include:

Category Include
LeRobot version pip show lerobot, git commit, or install method
Command Exact lerobot-train command
Policy lerobot/smolvla_base or other checkpoint
Robot SO101 / SO100 / other, follower/leader setup
Dataset Hub repo id or local path
Dataset format LeRobotDataset version if known
Cameras Number, names, views, order
State/action Shapes from metadata
Metadata Relevant parts of meta/info.json
Stats Relevant parts of meta/stats.json
Training curves loss, lr, grad_norm, eval metrics if any
Visualization screenshots or notes from lerobot-dataset-viz
Evaluation open-loop eval, real rollout behavior, success/failure examples
Requirement whether you need fully local/offline visualization

A good short Discord/forum report might look like:

I am fine-tuning SmolVLA on SO101 with LeRobot.

Goal:
- I want to visualize training curves locally if possible.
- I also want to confirm whether my dataset/camera/action setup is correct.

Setup:
- LeRobot version: <version-or-commit>
- Install method: <pip/source/docker/etc>
- Policy: <policy-path>
- Dataset: <dataset-repo-or-local-path>
- Robot: SO101
- Cameras: <camera-names-and-count>
- Training command: <exact-command>

What I checked:
- lerobot-dataset-viz: <works/does-not-work>
- meta/info.json: <relevant-shapes>
- meta/stats.json: <normalization-stats>
- W&B/Trackio/TensorBoard/logs: <what-you-tried>

Observed behavior:
- Training loss: <summary>
- Eval/rollout: <summary>
- Failure mode: <what-the-robot-does>

That gives the LeRobot community enough context to answer the robotics-specific part.


6. My current recommendation

If your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:

Rank Option Why
1 Parse local logs / CSV / JSONL Most robust, fully local, no integration risk
2 Trackio Best HF-native local/W&B-like dashboard candidate
3 W&B offline Good if you already want W&B-style tracking
4 TensorBoard Solid generic local ML tool
5 Full W&B online Easiest if you accept W&B account/cloud workflow

But for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:

  • dataset episodes
  • camera names/order/count
  • meta/info.json
  • meta/stats.json
  • state/action shapes
  • normalization
  • open-loop evaluation
  • real rollout behavior

In other words:

Trackio may help you see the curve, but lerobot-dataset-viz and dataset metadata may help you understand whether the curve is meaningful.


7. Links worth checking

LeRobot / SmolVLA

  • LeRobot org page
  • LeRobot docs
  • LeRobot GitHub
  • Imitation Learning on Real-World Robots
  • SmolVLA docs
  • SmolVLA blog post
  • LeRobotDataset v3.0 docs
  • LeRobotDataset v3.0 blog post

Trackio

  • Trackio docs
  • Trackio migration guide
  • Trackio blog post
  • Trackio GitHub repo

Related GitHub issues

  • Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
  • Clarifications on fine-tuning on different envs and embodiments
  • SO101 / SmolVLA camera setup discussion
  • SmolVLA / SO101 pretrained setup discussion
  • SmolVLA inference problem

Practical physical AI examples

  • Training ACT on SO-101
  • Fine-tuning NVIDIA GR00T N1.5 for SO-101
  • NVIDIA Isaac GR00T in LeRobot
  • Generalist robot policy evaluation with Isaac Lab Arena and LeRobot
  • Physical AI terminology overview

Discussion in the ATmosphere

Loading comments...