{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiczeydusltjp33uqyzoifcis4vkbavgufwttt5riqv5ngmtb4gohy",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnizb4aaum22"
},
"path": "/t/visualization-of-imitation-learning-by-smolvla-so101/176534#post_2",
"publishedAt": "2026-06-05T00:03:44.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"LeRobot",
"Trackio",
"LeRobot dataset tools",
"LeRobotDataset v3.0 docs",
"LeRobotDataset v3.0 blog post",
"LeRobot Dataset Visualizer repo",
"Imitation Learning on Real-World Robots",
"SmolVLA docs",
"Trackio docs",
"Trackio migration guide",
"Trackio blog post",
"Trackio GitHub repo",
"Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial",
"Clarifications on fine-tuning on different envs and embodiments",
"SO101/SmolVLA camera/setup discussion",
"SmolVLA poor performance / inference issue",
"LeRobot Dataset Visualizer",
"LeRobot imitation learning docs",
"LeRobot org page",
"LeRobot docs",
"LeRobot GitHub",
"SmolVLA blog post",
"SO101 / SmolVLA camera setup discussion",
"SmolVLA / SO101 pretrained setup discussion",
"SmolVLA inference problem",
"Training ACT on SO-101",
"Fine-tuning NVIDIA GR00T N1.5 for SO-101",
"NVIDIA Isaac GR00T in LeRobot",
"Generalist robot policy evaluation with Isaac Lab Arena and LeRobot",
"Physical AI terminology overview"
],
"textContent": "Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:\n\n* * *\n\n## Short version\n\nI think there are three different visualization/debugging questions mixed together here:\n\nQuestion | Tool family | What it helps with\n---|---|---\n“Is my recorded dataset sane?” | `lerobot-dataset-viz`, LeRobot dataset visualizers, Rerun-style episode inspection | Camera streams, robot states, actions, episode structure\n“How is training progressing?” | W&B, Trackio, TensorBoard, CSV/JSONL logs | Loss, learning rate, grad norm, eval metrics\n“Will the policy actually work on the robot?” | Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks | Success rate, action correctness, camera/state/action mismatch\n\nSo I would not treat `lerobot-dataset-viz` as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for `lerobot-train` yet.\n\nFor SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.\n\n* * *\n\n## 1. First split: dataset visualization vs training metrics\n\n### Dataset/episode visualization\n\nLeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:\n\n * camera frames\n * camera names/views\n * robot state streams\n * action streams\n * episode timing\n * whether the recorded behavior looks physically plausible\n\n\n\nRelevant docs/pages:\n\n * LeRobot dataset tools\n * LeRobotDataset v3.0 docs\n * LeRobotDataset v3.0 blog post\n * LeRobot Dataset Visualizer repo\n\n\n\nThis kind of tool answers questions like:\n\n> Did I record the right cameras, states, actions, and episodes?\n\nIt does **not** directly answer:\n\n> Is my loss decreasing over training steps?\n\nThose are different layers.\n\n### Training metric visualization\n\nFor training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, `wandb.enable=true` is described as optional and used for visualizing training plots:\n\n * Imitation Learning on Real-World Robots\n * SmolVLA docs\n\n\n\nSo for training metrics, I would think in terms of:\n\nOption | Local? | Good for | Caveat\n---|---|---|---\nW&B | Not local-first by default | Mature experiment tracking, training plots, media, artifacts | Requires W&B setup/login unless using offline mode\nW&B offline | Local logging first | Keeping W&B-style logs without immediate cloud sync | Still W&B-oriented; dashboard workflow may not be what you want\nTrackio | Yes, local-first | Local scalar curves and lightweight dashboards | Promising, but not necessarily a full W&B replacement for LeRobot\nTensorBoard | Yes | Classic local scalar curves | May require adding a writer if not already supported\nCSV/JSONL logs | Yes | Simple, robust, reproducible | No rich dashboard unless you build/plot one\n\n* * *\n\n## 2. Trackio may be the HF-native option you were thinking of\n\nIf you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of **Trackio** :\n\n * Trackio docs\n * Trackio migration guide\n * Trackio blog post\n * Trackio GitHub repo\n\n\n\nTrackio is very relevant here because it is:\n\n * Hugging Face-native\n * local-first\n * W&B-like\n * built around a Gradio dashboard\n * designed to log experiment metrics\n * able to sync/share through Hugging Face Spaces\n\n\n\nThe Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:\n\n\n import trackio as wandb\n\n wandb.init(project=\"my-project\", name=\"my-run\")\n wandb.log({\"train/loss\": 0.123, \"train/lr\": 1e-4}, step=100)\n wandb.finish()\n\n\nThat said, I would be careful with wording here.\n\nI would say:\n\n> Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.\n\nI would **not** say:\n\n> Trackio is a guaranteed drop-in replacement for LeRobot’s current `--wandb.enable=true` path.\n\nWhy not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain `wandb.log()` everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that `lerobot-train` already exposes something like:\n\n\n lerobot-train \\\n --trackio.enable=true\n\n\nunless that has been added in the specific LeRobot version you are using.\n\nA safer expectation is:\n\nLeRobot logging feature | Trackio likelihood | Notes\n---|---|---\nScalar metrics: loss, lr, grad norm | High | This is the easiest case\nEval metrics | High | If logged as scalars\nTables/images | Likely | Trackio has W&B-like media APIs, but exact behavior should be checked\nVideos | Maybe | Needs checking for the exact current API and dashboard behavior\nCheckpoint/artifact tracking | Be careful | W&B Artifacts and Trackio storage are not necessarily equivalent\nResume/run-id behavior | Be careful | W&B-specific run resume logic may not map 1:1\nFull W&B feature parity | No | Trackio is lightweight, not a full W&B clone\n\nSo my practical recommendation would be:\n\n 1. Use the standard documented W&B path first if you are okay with W&B.\n 2. If you want local-first scalar curves, investigate Trackio.\n 3. If using `lerobot-train`, assume Trackio may need a small logger wrapper or code patch.\n 4. If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.\n\n\n\n* * *\n\n## 3. Why the loss curve is not enough in SmolVLA/SO101\n\nThis is the most important robotics-specific point.\n\nIn ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.\n\nThere are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:\n\n * Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial\n * Clarifications on fine-tuning on different envs and embodiments\n * SO101/SmolVLA camera/setup discussion\n * SmolVLA poor performance / inference issue\n\n\n\nThe main lesson I would take from those is:\n\n> A clean loss curve does not guarantee a working rollout.\n\nFor SmolVLA/SO101, I would inspect at least these layers:\n\nLayer | What to check | Why it matters\n---|---|---\nCamera setup | Number of cameras, camera names, view order, resolution | VLA policies are sensitive to visual input schema\nState schema | Shape, order, meaning of `observation.state` | A converged loss can still learn the wrong mapping if state semantics differ\nAction schema | Shape, order, joint vs end-effector meaning, gripper representation | Action mismatch can make rollout fail even if training looks fine\nDataset metadata | `meta/info.json`, feature names, fps, codebase version | Confirms what the dataset actually contains\nDataset statistics | `meta/stats.json`, normalization values | Wrong normalization can break policy behavior\nEpisode visualization | Camera/state/action streams | Helps detect recording/config mistakes\nEvaluation | Open-loop eval, sim eval if available, real rollout | The final check is behavior, not just loss\nVersioning | LeRobot version, model checkpoint, dataset format version | LeRobot/SmolVLA are moving quickly\n\nThe SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.\n\n* * *\n\n## 4. What I would try locally\n\nIf I wanted the simplest local path before going deeper, I would try this order.\n\n### Step 1: Confirm the dataset visually\n\nUse the LeRobot dataset visualization path first.\n\nThings to look for:\n\n * Are all expected camera views present?\n * Do the camera names match what the policy/config expects?\n * Are the wrist/front/top/side views in the expected places?\n * Does the robot state change smoothly?\n * Do actions look non-zero and physically meaningful?\n * Are gripper actions represented correctly?\n * Is fps consistent with what the training config expects?\n * Are there broken/missing videos or episodes?\n\n\n\nRelevant links:\n\n * LeRobot dataset tools\n * LeRobotDataset v3.0 docs\n * LeRobot Dataset Visualizer\n\n\n\n### Step 2: Inspect metadata and stats\n\nOpen the dataset metadata files if available.\n\nFor LeRobotDataset v3, I would look at:\n\n\n meta/info.json\n meta/stats.json\n meta/tasks.jsonl\n meta/episodes.jsonl\n\n\nIn particular:\n\n\n observation.state\n action\n observation.images.<camera_name>\n fps\n features\n shape\n dtype\n codebase_version\n\n\nThis is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.\n\n### Step 3: Start with the official W&B path if possible\n\nIf you can use W&B, the official path is probably the least surprising first test:\n\n\n lerobot-train \\\n --policy.path=lerobot/smolvla_base \\\n --dataset.repo_id=<your-dataset-repo-id> \\\n --batch_size=<batch-size> \\\n --steps=<num-steps> \\\n --wandb.enable=true\n\n\nThe exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.\n\n### Step 4: If you want local-first curves, try Trackio separately\n\nFor a custom training script, Trackio may be very simple:\n\n\n import trackio as wandb\n\n wandb.init(\n project=\"smolvla-so101\",\n name=\"local-test\",\n config={\n \"policy\": \"smolvla_base\",\n \"robot\": \"so101\",\n },\n )\n\n wandb.log(\n {\n \"train/loss\": 0.123,\n \"train/lr\": 1e-4,\n \"train/grad_norm\": 0.5,\n },\n step=100,\n )\n\n wandb.finish()\n\n\nFor `lerobot-train`, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.\n\n### Step 5: If you want the most robust local fallback, log CSV/JSONL\n\nA very boring but reliable fallback is:\n\n\n {\"step\": 100, \"train/loss\": 0.123, \"train/lr\": 0.0001, \"train/grad_norm\": 0.5}\n {\"step\": 200, \"train/loss\": 0.098, \"train/lr\": 0.0001, \"train/grad_norm\": 0.47}\n\n\nThen plot it locally with Python.\n\nThis is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.\n\n* * *\n\n## 5. What to ask in LeRobot Discord\n\nFor SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”\n\nUseful information to include:\n\nCategory | Include\n---|---\nLeRobot version | `pip show lerobot`, git commit, or install method\nCommand | Exact `lerobot-train` command\nPolicy | `lerobot/smolvla_base` or other checkpoint\nRobot | SO101 / SO100 / other, follower/leader setup\nDataset | Hub repo id or local path\nDataset format | LeRobotDataset version if known\nCameras | Number, names, views, order\nState/action | Shapes from metadata\nMetadata | Relevant parts of `meta/info.json`\nStats | Relevant parts of `meta/stats.json`\nTraining curves | loss, lr, grad_norm, eval metrics if any\nVisualization | screenshots or notes from `lerobot-dataset-viz`\nEvaluation | open-loop eval, real rollout behavior, success/failure examples\nRequirement | whether you need fully local/offline visualization\n\nA good short Discord/forum report might look like:\n\n\n I am fine-tuning SmolVLA on SO101 with LeRobot.\n\n Goal:\n - I want to visualize training curves locally if possible.\n - I also want to confirm whether my dataset/camera/action setup is correct.\n\n Setup:\n - LeRobot version: <version-or-commit>\n - Install method: <pip/source/docker/etc>\n - Policy: <policy-path>\n - Dataset: <dataset-repo-or-local-path>\n - Robot: SO101\n - Cameras: <camera-names-and-count>\n - Training command: <exact-command>\n\n What I checked:\n - lerobot-dataset-viz: <works/does-not-work>\n - meta/info.json: <relevant-shapes>\n - meta/stats.json: <normalization-stats>\n - W&B/Trackio/TensorBoard/logs: <what-you-tried>\n\n Observed behavior:\n - Training loss: <summary>\n - Eval/rollout: <summary>\n - Failure mode: <what-the-robot-does>\n\n\nThat gives the LeRobot community enough context to answer the robotics-specific part.\n\n* * *\n\n## 6. My current recommendation\n\nIf your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:\n\nRank | Option | Why\n---|---|---\n1 | Parse local logs / CSV / JSONL | Most robust, fully local, no integration risk\n2 | Trackio | Best HF-native local/W&B-like dashboard candidate\n3 | W&B offline | Good if you already want W&B-style tracking\n4 | TensorBoard | Solid generic local ML tool\n5 | Full W&B online | Easiest if you accept W&B account/cloud workflow\n\nBut for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:\n\n * dataset episodes\n * camera names/order/count\n * `meta/info.json`\n * `meta/stats.json`\n * state/action shapes\n * normalization\n * open-loop evaluation\n * real rollout behavior\n\n\n\nIn other words:\n\n> Trackio may help you see the curve, but `lerobot-dataset-viz` and dataset metadata may help you understand whether the curve is meaningful.\n\n* * *\n\n## 7. Links worth checking\n\n### LeRobot / SmolVLA\n\n * LeRobot org page\n * LeRobot docs\n * LeRobot GitHub\n * Imitation Learning on Real-World Robots\n * SmolVLA docs\n * SmolVLA blog post\n * LeRobotDataset v3.0 docs\n * LeRobotDataset v3.0 blog post\n\n\n\n### Trackio\n\n * Trackio docs\n * Trackio migration guide\n * Trackio blog post\n * Trackio GitHub repo\n\n\n\n### Related GitHub issues\n\n * Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial\n * Clarifications on fine-tuning on different envs and embodiments\n * SO101 / SmolVLA camera setup discussion\n * SmolVLA / SO101 pretrained setup discussion\n * SmolVLA inference problem\n\n\n\n### Practical physical AI examples\n\n * Training ACT on SO-101\n * Fine-tuning NVIDIA GR00T N1.5 for SO-101\n * NVIDIA Isaac GR00T in LeRobot\n * Generalist robot policy evaluation with Isaac Lab Arena and LeRobot\n * Physical AI terminology overview\n\n",
"title": "Visualization of Imitation Learning by smolvla(SO101)"
}