External Publication
Visit Post

Onisin OS: chat with your own data, runs locally with any OpenAI-compatible model

Hugging Face Forums [Unofficial] May 27, 2026
Source

Hi everyone — I’ve just launched Onisin OS, a local-first desktop app that lets domain experts chat with their own operational data. I’d love feedback from the HF community on the model and embedding side specifically, because that’s where you’ll spot weaknesses faster than anyone else.

What it does You describe your data domain in a small text DSL (persons, claims, tickets, whatever fits your shop). From that one source of truth, the system derives a live GraphQL schema, the user-facing forms, and a semantic search index. A built-in ReAct agent translates natural-language questions into actual GraphQL queries and renders results as tables, detail forms, or pipeline reports.

The HF-relevant parts

  • Embedding model slot is OpenAI-API-compatible — runs against any inference server that speaks that protocol. Default in my dev setup is Ollama with granite-embedding, but I’ve also tested with sentence-transformers via a small adapter and with hosted endpoints.

  • Vector store is pgvector inside the same Postgres that holds the domain data, so semantic-match + relational-join is one SQL query, no separate vector service.

  • Chunking strategy is field-level: every domain field, relation and comment becomes its own chunk with a short alias list — so „employees" can find the person table via alias, without retraining anything.

  • The chat-side LLM is the same OpenAI-API slot — Ollama, vLLM, OpenAI, Anthropic, all interchangeable. The ReAct loop is hand-written (~80 lines), not LangChain.

What I’d love input on

  1. Embedding model choice for schema chunks (short, dense, mostly nouns + types) — granite-embedding works well, but is there something explicitly trained for this shape of text? Most benchmarks target longer prose.

  2. Has anyone hit a wall with pgvector at scale where a dedicated store was actually justified, or is „just use Postgres" still the right default in 2026?

  3. For the chat side: smaller local models (7B–14B class) handle the agent loop fine for simple intents but stumble on multi-step pipelines. Any recent open-weight models you’d recommend for tool-calling at that size?

Repo: https://github.com/frankvschrenk/onisin (BSL-1.1) Site: https://onisin.com

Happy to dig into any of the embedding/indexing details in this thread.

Discussion in the ATmosphere

Loading comments...