{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiaiaoeijxgxcfcehdvvtszwyhmdlab7tiri32xvnobzdk36byvbfa",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjedhqqagaj2"
},
"path": "/t/discussion-about-improving-intent-classification-accuracy-in-low-data-settings-with-overlapping-semantic-signals-using-lightweight-non-llm-techniques/175202#post_1",
"publishedAt": "2026-04-13T05:57:04.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "#### Hi everyone,\n\nI’m working on an intent classification system in a specialized domain with **very limited labeled data** (a few examples per intent) and running into issues with **semantic overlap across categories**.\n\n### Problem\n\nMany intents share overlapping vocabulary, and standard semantic similarity approaches (sentence embeddings, cosine similarity, etc.) tend to:\n\n * Overweight common/shared terms\n\n * Miss more **functional signals** (actions, relationships, constraints)\n\n * Result in misclassification when surface-level similarity dominates\n\n\n\n\n### Current Approach\n\nI’ve experimented with:\n\n * Sentence embedding models (for similarity-based routing)\n\n * Breaking intent descriptions into smaller semantic units (anchor-based matching)\n\n * Using NLI-style models as a secondary validation step\n\n\n\n\nWhile these help, I still see:\n\n * High-recall but low-precision terms dominating scoring\n\n * Difficulty encoding **negative intent boundaries** (i.e., signals that should exclude a class)\n\n\n\n\n### Looking For Suggestions On\n\n * Techniques to **weight or prioritize discriminative signals** over generic ones\n\n * Better ways to structure **intent representations** beyond plain embeddings\n\n * Approaches to incorporate **negative constraints** without relying on brittle rules\n\n * Any lightweight or hybrid pipelines (embedding + symbolic / statistical methods)\n\n\n\n\nI’m trying to avoid full LLM-based solutions for latency and interpretability reasons.\n\nWould really appreciate any insights, patterns, or references from folks who’ve tackled similar problems.\n\nThanks!",
"title": "Discussion about improving intent classification accuracy in low-data settings with overlapping semantic signals using lightweight, non-LLM techniques"
}