Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community
Hi everyone! I’ve been building DinoDS , a modular dataset system for LLM training built around lane-based dataset bundles.
The idea is simple: instead of treating training data like one giant premade dump, I’m organizing it into capability-focused bundles that map to specific assistant behaviors and failure types — things like:
retrieval grounding
workflow / tool routing
memory and continuity
structured outputs
identity and behavior shaping
I’ve started publishing some of these dataset bundle previews on Hugging Face, and I also made a Space that helps people explore which dataset bundle might actually be useful for their use case.
So the current flow is:
explore the DinoDS concept
identify what kind of assistant behavior you want to improve
see which bundle / lane family fits
check out the related dataset previews
I’d really love feedback from the HF community on a few things:
Does this bundle-first / lane-based way of presenting datasets make sense?
Is the Space + dataset bundle flow intuitive?
What would make these previews more useful for people evaluating training data?
Would you rather explore by failure type , capability , or use case?
You can check out the bundles, the Space, and the website here:
Hugging Face Space: Dinodataset Failure Mapper - a Hugging Face Space by DinoDS
Dataset bundles: DinoDS (DinoDS Labs)
Website: www.dinodsai.com
Would love thoughts, criticism, and suggestions — especially from people building assistants, copilots, routing systems, or structured-output workflows.
Discussion in the ATmosphere