External Publication

Discussion about improving intent classification accuracy in low-data settings with overlapping semantic signals using lightweight, non-LLM techniques

Hugging Face Forums [Unofficial] April 13, 2026

Hi everyone,

I’m working on an intent classification system in a specialized domain with very limited labeled data (a few examples per intent) and running into issues with semantic overlap across categories.

Problem

Many intents share overlapping vocabulary, and standard semantic similarity approaches (sentence embeddings, cosine similarity, etc.) tend to:

Overweight common/shared terms
Miss more functional signals (actions, relationships, constraints)
Result in misclassification when surface-level similarity dominates

Current Approach

I’ve experimented with:

Sentence embedding models (for similarity-based routing)
Breaking intent descriptions into smaller semantic units (anchor-based matching)
Using NLI-style models as a secondary validation step

While these help, I still see:

High-recall but low-precision terms dominating scoring
Difficulty encoding negative intent boundaries (i.e., signals that should exclude a class)

Looking For Suggestions On

Techniques to weight or prioritize discriminative signals over generic ones
Better ways to structure intent representations beyond plain embeddings
Approaches to incorporate negative constraints without relying on brittle rules
Any lightweight or hybrid pipelines (embedding + symbolic / statistical methods)

I’m trying to avoid full LLM-based solutions for latency and interpretability reasons.

Would really appreciate any insights, patterns, or references from folks who’ve tackled similar problems.

Thanks!

Hi everyone,

Problem

Current Approach

Looking For Suggestions On

Discussion in the ATmosphere