External Publication
Visit Post

Mproving Intent Classification with Overlapping Semantics in Low-Data Settings (Non-LLM Approaches)

Hugging Face Forums [Unofficial] April 13, 2026
Source

Hi everyone,

I’m working on an intent classification system in a specialized domain with very limited labeled data (a few examples per intent) and running into issues with semantic overlap across categories.

Problem

Many intents share overlapping vocabulary, and standard semantic similarity approaches (sentence embeddings, cosine similarity, etc.) tend to:

  • Overweight common/shared terms

  • Miss more functional signals (actions, relationships, constraints)

  • Result in misclassification when surface-level similarity dominates

Current Approach

I’ve experimented with:

  • Sentence embedding models (for similarity-based routing)

  • Breaking intent descriptions into smaller semantic units (anchor-based matching)

  • Using NLI-style models as a secondary validation step

While these help, I still see:

  • High-recall but low-precision terms dominating scoring

  • Difficulty encoding negative intent boundaries (i.e., signals that should exclude a class)

Looking For Suggestions On

  • Techniques to weight or prioritize discriminative signals over generic ones

  • Better ways to structure intent representations beyond plain embeddings

  • Approaches to incorporate negative constraints without relying on brittle rules

  • Any lightweight or hybrid pipelines (embedding + symbolic / statistical methods)

I’m trying to avoid full LLM-based solutions for latency and interpretability reasons.

Would really appreciate any insights, patterns, or references from folks who’ve tackled similar problems.

Thanks!

Discussion in the ATmosphere

Loading comments...