External Publication

Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community

Hugging Face Forums [Unofficial] April 29, 2026

Hi everyone! I’ve been building DinoDS , a modular dataset system for LLM training built around lane-based dataset bundles.

The idea is simple: instead of treating training data like one giant premade dump, I’m organizing it into capability-focused bundles that map to specific assistant behaviors and failure types — things like:

retrieval grounding
workflow / tool routing
memory and continuity
structured outputs
identity and behavior shaping

I’ve started publishing some of these dataset bundle previews on Hugging Face, and I also made a Space that helps people explore which dataset bundle might actually be useful for their use case.

So the current flow is:

explore the DinoDS concept
identify what kind of assistant behavior you want to improve
see which bundle / lane family fits
check out the related dataset previews

I’d really love feedback from the HF community on a few things:

Does this bundle-first / lane-based way of presenting datasets make sense?
Is the Space + dataset bundle flow intuitive?
What would make these previews more useful for people evaluating training data?
Would you rather explore by failure type , capability , or use case?

You can check out the bundles, the Space, and the website here:

Hugging Face Space: Dinodataset Failure Mapper - a Hugging Face Space by DinoDS
Dataset bundles: DinoDS (DinoDS Labs)
Website: www.dinodsai.com

Would love thoughts, criticism, and suggestions — especially from people building assistants, copilots, routing systems, or structured-output workflows.

Discussion in the ATmosphere