External Publication
Visit Post

Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community

Hugging Face Forums [Unofficial] April 29, 2026
Source

Hi everyone! I’ve been building DinoDS , a modular dataset system for LLM training built around lane-based dataset bundles.

The idea is simple: instead of treating training data like one giant premade dump, I’m organizing it into capability-focused bundles that map to specific assistant behaviors and failure types — things like:

  • retrieval grounding

  • workflow / tool routing

  • memory and continuity

  • structured outputs

  • identity and behavior shaping

I’ve started publishing some of these dataset bundle previews on Hugging Face, and I also made a Space that helps people explore which dataset bundle might actually be useful for their use case.

So the current flow is:

  • explore the DinoDS concept

  • identify what kind of assistant behavior you want to improve

  • see which bundle / lane family fits

  • check out the related dataset previews

I’d really love feedback from the HF community on a few things:

  1. Does this bundle-first / lane-based way of presenting datasets make sense?

  2. Is the Space + dataset bundle flow intuitive?

  3. What would make these previews more useful for people evaluating training data?

  4. Would you rather explore by failure type , capability , or use case?

You can check out the bundles, the Space, and the website here:

  • Hugging Face Space: Dinodataset Failure Mapper - a Hugging Face Space by DinoDS

  • Dataset bundles: DinoDS (DinoDS Labs)

  • Website: www.dinodsai.com

Would love thoughts, criticism, and suggestions — especially from people building assistants, copilots, routing systems, or structured-output workflows.

Discussion in the ATmosphere

Loading comments...