Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiczeydusltjp33uqyzoifcis4vkbavgufwttt5riqv5ngmtb4gohy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnizb4aaum22"
  },
  "path": "/t/visualization-of-imitation-learning-by-smolvla-so101/176534#post_2",
  "publishedAt": "2026-06-05T00:03:44.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "LeRobot",
    "Trackio",
    "LeRobot dataset tools",
    "LeRobotDataset v3.0 docs",
    "LeRobotDataset v3.0 blog post",
    "LeRobot Dataset Visualizer repo",
    "Imitation Learning on Real-World Robots",
    "SmolVLA docs",
    "Trackio docs",
    "Trackio migration guide",
    "Trackio blog post",
    "Trackio GitHub repo",
    "Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial",
    "Clarifications on fine-tuning on different envs and embodiments",
    "SO101/SmolVLA camera/setup discussion",
    "SmolVLA poor performance / inference issue",
    "LeRobot Dataset Visualizer",
    "LeRobot imitation learning docs",
    "LeRobot org page",
    "LeRobot docs",
    "LeRobot GitHub",
    "SmolVLA blog post",
    "SO101 / SmolVLA camera setup discussion",
    "SmolVLA / SO101 pretrained setup discussion",
    "SmolVLA inference problem",
    "Training ACT on SO-101",
    "Fine-tuning NVIDIA GR00T N1.5 for SO-101",
    "NVIDIA Isaac GR00T in LeRobot",
    "Generalist robot policy evaluation with Isaac Lab Arena and LeRobot",
    "Physical AI terminology overview"
  ],
  "textContent": "Since this is in the physical AI / real-world robotics area, I think the LeRobot Discord is probably the best place to get the most useful follow-up. But here is how I would separate the pieces first, based on what I can check from the public docs, examples, and related issues:\n\n* * *\n\n## Short version\n\nI think there are three different visualization/debugging questions mixed together here:\n\nQuestion | Tool family | What it helps with\n---|---|---\n“Is my recorded dataset sane?” | `lerobot-dataset-viz`, LeRobot dataset visualizers, Rerun-style episode inspection | Camera streams, robot states, actions, episode structure\n“How is training progressing?” | W&B, Trackio, TensorBoard, CSV/JSONL logs | Loss, learning rate, grad norm, eval metrics\n“Will the policy actually work on the robot?” | Evaluation rollouts, open-loop evaluation, real robot testing, dataset/action sanity checks | Success rate, action correctness, camera/state/action mismatch\n\nSo I would not treat `lerobot-dataset-viz` as a replacement for a learning-curve dashboard. It is more of a dataset/episode inspection tool. For learning curves, W&B is the documented path in current LeRobot examples, and Trackio looks like the most relevant Hugging Face-native local/W&B-like alternative, but probably not a confirmed one-flag replacement for `lerobot-train` yet.\n\nFor SmolVLA/SO101 specifically, I would also be careful not to over-trust the loss curve. There are related LeRobot issues where the loss converged and/or W&B plots looked fine, but evaluation success was still 0%. That suggests that for VLA/robotics, the dataset schema, camera setup, state/action definition, normalization/statistics, and rollout evaluation can matter as much as the scalar training curve.\n\n* * *\n\n## 1. First split: dataset visualization vs training metrics\n\n### Dataset/episode visualization\n\nLeRobot has dataset visualization tools for looking at recorded episodes. This is useful for checking things like:\n\n  * camera frames\n  * camera names/views\n  * robot state streams\n  * action streams\n  * episode timing\n  * whether the recorded behavior looks physically plausible\n\n\n\nRelevant docs/pages:\n\n  * LeRobot dataset tools\n  * LeRobotDataset v3.0 docs\n  * LeRobotDataset v3.0 blog post\n  * LeRobot Dataset Visualizer repo\n\n\n\nThis kind of tool answers questions like:\n\n> Did I record the right cameras, states, actions, and episodes?\n\nIt does **not** directly answer:\n\n> Is my loss decreasing over training steps?\n\nThose are different layers.\n\n### Training metric visualization\n\nFor training curves, the current LeRobot docs show W&B as the normal documented example. In the real-world imitation learning tutorial, `wandb.enable=true` is described as optional and used for visualizing training plots:\n\n  * Imitation Learning on Real-World Robots\n  * SmolVLA docs\n\n\n\nSo for training metrics, I would think in terms of:\n\nOption | Local? | Good for | Caveat\n---|---|---|---\nW&B | Not local-first by default | Mature experiment tracking, training plots, media, artifacts | Requires W&B setup/login unless using offline mode\nW&B offline | Local logging first | Keeping W&B-style logs without immediate cloud sync | Still W&B-oriented; dashboard workflow may not be what you want\nTrackio | Yes, local-first | Local scalar curves and lightweight dashboards | Promising, but not necessarily a full W&B replacement for LeRobot\nTensorBoard | Yes | Classic local scalar curves | May require adding a writer if not already supported\nCSV/JSONL logs | Yes | Simple, robust, reproducible | No rich dashboard unless you build/plot one\n\n* * *\n\n## 2. Trackio may be the HF-native option you were thinking of\n\nIf you were remembering a Gradio-based Hugging Face alternative to W&B, I think you may be thinking of **Trackio** :\n\n  * Trackio docs\n  * Trackio migration guide\n  * Trackio blog post\n  * Trackio GitHub repo\n\n\n\nTrackio is very relevant here because it is:\n\n  * Hugging Face-native\n  * local-first\n  * W&B-like\n  * built around a Gradio dashboard\n  * designed to log experiment metrics\n  * able to sync/share through Hugging Face Spaces\n\n\n\nThe Trackio migration docs say that migrating from W&B is usually simple because Trackio uses W&B-like API syntax. In simple scripts, the idea can be as small as:\n\n\n    import trackio as wandb\n\n    wandb.init(project=\"my-project\", name=\"my-run\")\n    wandb.log({\"train/loss\": 0.123, \"train/lr\": 1e-4}, step=100)\n    wandb.finish()\n\n\nThat said, I would be careful with wording here.\n\nI would say:\n\n> Trackio looks like the closest Hugging Face-native local/W&B-like option for scalar training curves.\n\nI would **not** say:\n\n> Trackio is a guaranteed drop-in replacement for LeRobot’s current `--wandb.enable=true` path.\n\nWhy not? Because LeRobot appears to have its own W&B-specific logger wrapper rather than only calling plain `wandb.log()` everywhere. So Trackio may work well with a small custom logger/wrapper, but I would not assume that `lerobot-train` already exposes something like:\n\n\n    lerobot-train \\\n      --trackio.enable=true\n\n\nunless that has been added in the specific LeRobot version you are using.\n\nA safer expectation is:\n\nLeRobot logging feature | Trackio likelihood | Notes\n---|---|---\nScalar metrics: loss, lr, grad norm | High | This is the easiest case\nEval metrics | High | If logged as scalars\nTables/images | Likely | Trackio has W&B-like media APIs, but exact behavior should be checked\nVideos | Maybe | Needs checking for the exact current API and dashboard behavior\nCheckpoint/artifact tracking | Be careful | W&B Artifacts and Trackio storage are not necessarily equivalent\nResume/run-id behavior | Be careful | W&B-specific run resume logic may not map 1:1\nFull W&B feature parity | No | Trackio is lightweight, not a full W&B clone\n\nSo my practical recommendation would be:\n\n  1. Use the standard documented W&B path first if you are okay with W&B.\n  2. If you want local-first scalar curves, investigate Trackio.\n  3. If using `lerobot-train`, assume Trackio may need a small logger wrapper or code patch.\n  4. If you only need a quick local curve, parse stdout/logs or write CSV/JSONL first.\n\n\n\n* * *\n\n## 3. Why the loss curve is not enough in SmolVLA/SO101\n\nThis is the most important robotics-specific point.\n\nIn ordinary ML, a learning curve can often tell you a lot. In real-world robotics and VLA training, it is only one signal.\n\nThere are related LeRobot issues where training loss or W&B plots looked good, but evaluation did not work:\n\n  * Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial\n  * Clarifications on fine-tuning on different envs and embodiments\n  * SO101/SmolVLA camera/setup discussion\n  * SmolVLA poor performance / inference issue\n\n\n\nThe main lesson I would take from those is:\n\n> A clean loss curve does not guarantee a working rollout.\n\nFor SmolVLA/SO101, I would inspect at least these layers:\n\nLayer | What to check | Why it matters\n---|---|---\nCamera setup | Number of cameras, camera names, view order, resolution | VLA policies are sensitive to visual input schema\nState schema | Shape, order, meaning of `observation.state` | A converged loss can still learn the wrong mapping if state semantics differ\nAction schema | Shape, order, joint vs end-effector meaning, gripper representation | Action mismatch can make rollout fail even if training looks fine\nDataset metadata | `meta/info.json`, feature names, fps, codebase version | Confirms what the dataset actually contains\nDataset statistics | `meta/stats.json`, normalization values | Wrong normalization can break policy behavior\nEpisode visualization | Camera/state/action streams | Helps detect recording/config mistakes\nEvaluation | Open-loop eval, sim eval if available, real rollout | The final check is behavior, not just loss\nVersioning | LeRobot version, model checkpoint, dataset format version | LeRobot/SmolVLA are moving quickly\n\nThe SmolVLA docs describe SmolVLA as taking multiple camera views, the current sensorimotor state, and a natural language instruction, then generating an action chunk. That means the model is not just learning from a text prompt or a single tensor. The camera/state/action contract matters.\n\n* * *\n\n## 4. What I would try locally\n\nIf I wanted the simplest local path before going deeper, I would try this order.\n\n### Step 1: Confirm the dataset visually\n\nUse the LeRobot dataset visualization path first.\n\nThings to look for:\n\n  * Are all expected camera views present?\n  * Do the camera names match what the policy/config expects?\n  * Are the wrist/front/top/side views in the expected places?\n  * Does the robot state change smoothly?\n  * Do actions look non-zero and physically meaningful?\n  * Are gripper actions represented correctly?\n  * Is fps consistent with what the training config expects?\n  * Are there broken/missing videos or episodes?\n\n\n\nRelevant links:\n\n  * LeRobot dataset tools\n  * LeRobotDataset v3.0 docs\n  * LeRobot Dataset Visualizer\n\n\n\n### Step 2: Inspect metadata and stats\n\nOpen the dataset metadata files if available.\n\nFor LeRobotDataset v3, I would look at:\n\n\n    meta/info.json\n    meta/stats.json\n    meta/tasks.jsonl\n    meta/episodes.jsonl\n\n\nIn particular:\n\n\n    observation.state\n    action\n    observation.images.<camera_name>\n    fps\n    features\n    shape\n    dtype\n    codebase_version\n\n\nThis is boring but important. If the dataset schema and policy expectation disagree, the loss curve may not tell you the real problem.\n\n### Step 3: Start with the official W&B path if possible\n\nIf you can use W&B, the official path is probably the least surprising first test:\n\n\n    lerobot-train \\\n      --policy.path=lerobot/smolvla_base \\\n      --dataset.repo_id=<your-dataset-repo-id> \\\n      --batch_size=<batch-size> \\\n      --steps=<num-steps> \\\n      --wandb.enable=true\n\n\nThe exact command should follow the current LeRobot imitation learning docs and SmolVLA docs, because the CLI/config names can change across LeRobot versions.\n\n### Step 4: If you want local-first curves, try Trackio separately\n\nFor a custom training script, Trackio may be very simple:\n\n\n    import trackio as wandb\n\n    wandb.init(\n        project=\"smolvla-so101\",\n        name=\"local-test\",\n        config={\n            \"policy\": \"smolvla_base\",\n            \"robot\": \"so101\",\n        },\n    )\n\n    wandb.log(\n        {\n            \"train/loss\": 0.123,\n            \"train/lr\": 1e-4,\n            \"train/grad_norm\": 0.5,\n        },\n        step=100,\n    )\n\n    wandb.finish()\n\n\nFor `lerobot-train`, I would expect this to require a small logger integration unless LeRobot has added official Trackio support in your version.\n\n### Step 5: If you want the most robust local fallback, log CSV/JSONL\n\nA very boring but reliable fallback is:\n\n\n    {\"step\": 100, \"train/loss\": 0.123, \"train/lr\": 0.0001, \"train/grad_norm\": 0.5}\n    {\"step\": 200, \"train/loss\": 0.098, \"train/lr\": 0.0001, \"train/grad_norm\": 0.47}\n\n\nThen plot it locally with Python.\n\nThis is not fancy, but it avoids account setup, dashboard assumptions, and integration drift.\n\n* * *\n\n## 5. What to ask in LeRobot Discord\n\nFor SO101/SmolVLA, I would bring a compact but complete report to the LeRobot Discord. That will probably get better answers than only asking “how do I visualize the curve?”\n\nUseful information to include:\n\nCategory | Include\n---|---\nLeRobot version | `pip show lerobot`, git commit, or install method\nCommand | Exact `lerobot-train` command\nPolicy | `lerobot/smolvla_base` or other checkpoint\nRobot | SO101 / SO100 / other, follower/leader setup\nDataset | Hub repo id or local path\nDataset format | LeRobotDataset version if known\nCameras | Number, names, views, order\nState/action | Shapes from metadata\nMetadata | Relevant parts of `meta/info.json`\nStats | Relevant parts of `meta/stats.json`\nTraining curves | loss, lr, grad_norm, eval metrics if any\nVisualization | screenshots or notes from `lerobot-dataset-viz`\nEvaluation | open-loop eval, real rollout behavior, success/failure examples\nRequirement | whether you need fully local/offline visualization\n\nA good short Discord/forum report might look like:\n\n\n    I am fine-tuning SmolVLA on SO101 with LeRobot.\n\n    Goal:\n    - I want to visualize training curves locally if possible.\n    - I also want to confirm whether my dataset/camera/action setup is correct.\n\n    Setup:\n    - LeRobot version: <version-or-commit>\n    - Install method: <pip/source/docker/etc>\n    - Policy: <policy-path>\n    - Dataset: <dataset-repo-or-local-path>\n    - Robot: SO101\n    - Cameras: <camera-names-and-count>\n    - Training command: <exact-command>\n\n    What I checked:\n    - lerobot-dataset-viz: <works/does-not-work>\n    - meta/info.json: <relevant-shapes>\n    - meta/stats.json: <normalization-stats>\n    - W&B/Trackio/TensorBoard/logs: <what-you-tried>\n\n    Observed behavior:\n    - Training loss: <summary>\n    - Eval/rollout: <summary>\n    - Failure mode: <what-the-robot-does>\n\n\nThat gives the LeRobot community enough context to answer the robotics-specific part.\n\n* * *\n\n## 6. My current recommendation\n\nIf your immediate goal is just “I want to see the learning curve locally,” I would rank the options like this:\n\nRank | Option | Why\n---|---|---\n1 | Parse local logs / CSV / JSONL | Most robust, fully local, no integration risk\n2 | Trackio | Best HF-native local/W&B-like dashboard candidate\n3 | W&B offline | Good if you already want W&B-style tracking\n4 | TensorBoard | Solid generic local ML tool\n5 | Full W&B online | Easiest if you accept W&B account/cloud workflow\n\nBut for SmolVLA/SO101 specifically, I would not stop at the learning curve. I would also inspect:\n\n  * dataset episodes\n  * camera names/order/count\n  * `meta/info.json`\n  * `meta/stats.json`\n  * state/action shapes\n  * normalization\n  * open-loop evaluation\n  * real rollout behavior\n\n\n\nIn other words:\n\n> Trackio may help you see the curve, but `lerobot-dataset-viz` and dataset metadata may help you understand whether the curve is meaningful.\n\n* * *\n\n## 7. Links worth checking\n\n### LeRobot / SmolVLA\n\n  * LeRobot org page\n  * LeRobot docs\n  * LeRobot GitHub\n  * Imitation Learning on Real-World Robots\n  * SmolVLA docs\n  * SmolVLA blog post\n  * LeRobotDataset v3.0 docs\n  * LeRobotDataset v3.0 blog post\n\n\n\n### Trackio\n\n  * Trackio docs\n  * Trackio migration guide\n  * Trackio blog post\n  * Trackio GitHub repo\n\n\n\n### Related GitHub issues\n\n  * Training loss unmatched with test performance using SmolVLA with LIBERO-Spatial\n  * Clarifications on fine-tuning on different envs and embodiments\n  * SO101 / SmolVLA camera setup discussion\n  * SmolVLA / SO101 pretrained setup discussion\n  * SmolVLA inference problem\n\n\n\n### Practical physical AI examples\n\n  * Training ACT on SO-101\n  * Fine-tuning NVIDIA GR00T N1.5 for SO-101\n  * NVIDIA Isaac GR00T in LeRobot\n  * Generalist robot policy evaluation with Isaac Lab Arena and LeRobot\n  * Physical AI terminology overview\n\n",
  "title": "Visualization of Imitation Learning by smolvla(SO101)"
}