Visualization of Imitation Learning by smolvla(SO101)
Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:
Short version
I think there are three different visualization/debugging questions mixed together here:
| Question | Tool family | What it helps with |
|---|---|---|
| “Is my recorded dataset sane?” | lerobot-dataset-viz, LeRobot dataset visualizers, Rerun-style episode inspection |
Camera streams, robot states, actions, episode structure |
| “How is training progressing?” | W&B, Trackio, TensorBoard, CSV/JSONL logs | Loss, learning rate, grad norm, eval metrics |
| “Will the policy actually work on the robot?” | Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks | Success rate, action correctness, camera/state/action mismatch |
So I would not treat lerobot-dataset-viz as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for lerobot-train yet.
For SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.
1. First split: dataset visualization vs training metrics
Dataset/episode visualization
LeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:
- camera frames
- camera names/views
- robot state streams
- action streams
- episode timing
- whether the recorded behavior looks physically plausible
Relevant docs/pages:
- LeRobot dataset tools
- LeRobotDataset v3.0 docs
- LeRobotDataset v3.0 blog post
- LeRobot Dataset Visualizer repo
This kind of tool answers questions like:
Did I record the right cameras, states, actions, and episodes?
It does not directly answer:
Is my loss decreasing over training steps?
Those are different layers.
Training metric visualization
For training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, wandb.enable=true is described as optional and used for visualizing training plots:
- Imitation Learning on Real-World Robots
- SmolVLA docs
So for training metrics, I would think in terms of:
| Option | Local? | Good for | Caveat |
|---|---|---|---|
| W&B | Not local-first by default | Mature experiment tracking, training plots, media, artifacts | Requires W&B setup/login unless using offline mode |
| W&B offline | Local logging first | Keeping W&B-style logs without immediate cloud sync | Still W&B-oriented; dashboard workflow may not be what you want |
| Trackio | Yes, local-first | Local scalar curves and lightweight dashboards | Promising, but not necessarily a full W&B replacement for LeRobot |
| TensorBoard | Yes | Classic local scalar curves | May require adding a writer if not already supported |
| CSV/JSONL logs | Yes | Simple, robust, reproducible | No rich dashboard unless you build/plot one |
2. Trackio may be the HF-native option you were thinking of
If you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of Trackio :
- Trackio docs
- Trackio migration guide
- Trackio blog post
- Trackio GitHub repo
Trackio is very relevant here because it is:
- Hugging Face-native
- local-first
- W&B-like
- built around a Gradio dashboard
- designed to log experiment metrics
- able to sync/share through Hugging Face Spaces
The Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:
import trackio as wandb
wandb.init(project="my-project", name="my-run")
wandb.log({"train/loss": 0.123, "train/lr": 1e-4}, step=100)
wandb.finish()
That said, I would be careful with wording here.
I would say:
Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.
I would not say:
Trackio is a guaranteed drop-in replacement for LeRobot’s current
--wandb.enable=truepath.
Why not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain wandb.log() everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that lerobot-train already exposes something like:
lerobot-train \
--trackio.enable=true
unless that has been added in the specific LeRobot version you are using.
A safer expectation is:
| LeRobot logging feature | Trackio likelihood | Notes |
|---|---|---|
| Scalar metrics: loss, lr, grad norm | High | This is the easiest case |
| Eval metrics | High | If logged as scalars |
| Tables/images | Likely | Trackio has W&B-like media APIs, but exact behavior should be checked |
| Videos | Maybe | Needs checking for the exact current API and dashboard behavior |
| Checkpoint/artifact tracking | Be careful | W&B Artifacts and Trackio storage are not necessarily equivalent |
| Resume/run-id behavior | Be careful | W&B-specific run resume logic may not map 1:1 |
| Full W&B feature parity | No | Trackio is lightweight, not a full W&B clone |
So my practical recommendation would be:
- Use the standard documented W&B path first if you are okay with W&B.
- If you want local-first scalar curves, investigate Trackio.
- If using
lerobot-train, assume Trackio may need a small logger wrapper or code patch. - If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.
3. Why the loss curve is not enough in SmolVLA/SO101
This is the most important robotics-specific point.
In ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.
There are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:
- Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
- Clarifications on fine-tuning on different envs and embodiments
- SO101/SmolVLA camera/setup discussion
- SmolVLA poor performance / inference issue
The main lesson I would take from those is:
A clean loss curve does not guarantee a working rollout.
For SmolVLA/SO101, I would inspect at least these layers:
| Layer | What to check | Why it matters |
|---|---|---|
| Camera setup | Number of cameras, camera names, view order, resolution | VLA policies are sensitive to visual input schema |
| State schema | Shape, order, meaning of observation.state |
A converged loss can still learn the wrong mapping if state semantics differ |
| Action schema | Shape, order, joint vs end-effector meaning, gripper representation | Action mismatch can make rollout fail even if training looks fine |
| Dataset metadata | meta/info.json, feature names, fps, codebase version |
Confirms what the dataset actually contains |
| Dataset statistics | meta/stats.json, normalization values |
Wrong normalization can break policy behavior |
| Episode visualization | Camera/state/action streams | Helps detect recording/config mistakes |
| Evaluation | Open-loop eval, sim eval if available, real rollout | The final check is behavior, not just loss |
| Versioning | LeRobot version, model checkpoint, dataset format version | LeRobot/SmolVLA are moving quickly |
The SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.
4. What I would try locally
If I wanted the simplest local path before going deeper, I would try this order.
Step 1: Confirm the dataset visually
Use the LeRobot dataset visualization path first.
Things to look for:
- Are all expected camera views present?
- Do the camera names match what the policy/config expects?
- Are the wrist/front/top/side views in the expected places?
- Does the robot state change smoothly?
- Do actions look non-zero and physically meaningful?
- Are gripper actions represented correctly?
- Is fps consistent with what the training config expects?
- Are there broken/missing videos or episodes?
Relevant links:
- LeRobot dataset tools
- LeRobotDataset v3.0 docs
- LeRobot Dataset Visualizer
Step 2: Inspect metadata and stats
Open the dataset metadata files if available.
For LeRobotDataset v3, I would look at:
meta/info.json
meta/stats.json
meta/tasks.jsonl
meta/episodes.jsonl
In particular:
observation.state
action
observation.images.<camera_name>
fps
features
shape
dtype
codebase_version
This is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.
Step 3: Start with the official W&B path if possible
If you can use W&B, the official path is probably the least surprising first test:
lerobot-train \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=<your-dataset-repo-id> \
--batch_size=<batch-size> \
--steps=<num-steps> \
--wandb.enable=true
The exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.
Step 4: If you want local-first curves, try Trackio separately
For a custom training script, Trackio may be very simple:
import trackio as wandb
wandb.init(
project="smolvla-so101",
name="local-test",
config={
"policy": "smolvla_base",
"robot": "so101",
},
)
wandb.log(
{
"train/loss": 0.123,
"train/lr": 1e-4,
"train/grad_norm": 0.5,
},
step=100,
)
wandb.finish()
For lerobot-train, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.
Step 5: If you want the most robust local fallback, log CSV/JSONL
A very boring but reliable fallback is:
{"step": 100, "train/loss": 0.123, "train/lr": 0.0001, "train/grad_norm": 0.5}
{"step": 200, "train/loss": 0.098, "train/lr": 0.0001, "train/grad_norm": 0.47}
Then plot it locally with Python.
This is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.
5. What to ask in LeRobot Discord
For SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”
Useful information to include:
| Category | Include |
|---|---|
| LeRobot version | pip show lerobot, git commit, or install method |
| Command | Exact lerobot-train command |
| Policy | lerobot/smolvla_base or other checkpoint |
| Robot | SO101 / SO100 / other, follower/leader setup |
| Dataset | Hub repo id or local path |
| Dataset format | LeRobotDataset version if known |
| Cameras | Number, names, views, order |
| State/action | Shapes from metadata |
| Metadata | Relevant parts of meta/info.json |
| Stats | Relevant parts of meta/stats.json |
| Training curves | loss, lr, grad_norm, eval metrics if any |
| Visualization | screenshots or notes from lerobot-dataset-viz |
| Evaluation | open-loop eval, real rollout behavior, success/failure examples |
| Requirement | whether you need fully local/offline visualization |
A good short Discord/forum report might look like:
I am fine-tuning SmolVLA on SO101 with LeRobot.
Goal:
- I want to visualize training curves locally if possible.
- I also want to confirm whether my dataset/camera/action setup is correct.
Setup:
- LeRobot version: <version-or-commit>
- Install method: <pip/source/docker/etc>
- Policy: <policy-path>
- Dataset: <dataset-repo-or-local-path>
- Robot: SO101
- Cameras: <camera-names-and-count>
- Training command: <exact-command>
What I checked:
- lerobot-dataset-viz: <works/does-not-work>
- meta/info.json: <relevant-shapes>
- meta/stats.json: <normalization-stats>
- W&B/Trackio/TensorBoard/logs: <what-you-tried>
Observed behavior:
- Training loss: <summary>
- Eval/rollout: <summary>
- Failure mode: <what-the-robot-does>
That gives the LeRobot community enough context to answer the robotics-specific part.
6. My current recommendation
If your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:
| Rank | Option | Why |
|---|---|---|
| 1 | Parse local logs / CSV / JSONL | Most robust, fully local, no integration risk |
| 2 | Trackio | Best HF-native local/W&B-like dashboard candidate |
| 3 | W&B offline | Good if you already want W&B-style tracking |
| 4 | TensorBoard | Solid generic local ML tool |
| 5 | Full W&B online | Easiest if you accept W&B account/cloud workflow |
But for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:
- dataset episodes
- camera names/order/count
meta/info.jsonmeta/stats.json- state/action shapes
- normalization
- open-loop evaluation
- real rollout behavior
In other words:
Trackio may help you see the curve, but
lerobot-dataset-vizand dataset metadata may help you understand whether the curve is meaningful.
7. Links worth checking
LeRobot / SmolVLA
- LeRobot org page
- LeRobot docs
- LeRobot GitHub
- Imitation Learning on Real-World Robots
- SmolVLA docs
- SmolVLA blog post
- LeRobotDataset v3.0 docs
- LeRobotDataset v3.0 blog post
Trackio
- Trackio docs
- Trackio migration guide
- Trackio blog post
- Trackio GitHub repo
Related GitHub issues
- Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial
- Clarifications on fine-tuning on different envs and embodiments
- SO101 / SmolVLA camera setup discussion
- SmolVLA / SO101 pretrained setup discussion
- SmolVLA inference problem
Practical physical AI examples
- Training ACT on SO-101
- Fine-tuning NVIDIA GR00T N1.5 for SO-101
- NVIDIA Isaac GR00T in LeRobot
- Generalist robot policy evaluation with Isaac Lab Arena and LeRobot
- Physical AI terminology overview
Discussion in the ATmosphere