External Publication

Need generative model, high-quality description generation

Hugging Face Forums [Unofficial] May 27, 2026

Thanks for the detailed guidance — this is very helpful. I agree with your main point: for our use case, the hard part is not finding one “best” model, but designing a reliable pipeline around structured operator data.

Our setup

Stack: PostgreSQL + Java (Spring Boot) + React/Node.js, hosted on AWS
We’re in production, and the app/web is running successfully. This is a stage-2 upgrade where we’re adding AI for content generation.
For cost reasons, we’ll likely start with an API-based approach (free tier or OpenRouter), not self-hosted models.

What we’re generating We have operator pages. The UI is fixed and shared across all operators. The problem is the content: bio, service areas, location, services offered, FAQ, etc. These must be:

Unique
Factually accurate (locked to operator inputs)
SEO-friendly
Scalable to 1,000+ existing operators and all future registrations

Our workflow (with REST API)

User clicks Create Profile in the frontend.
Frontend sends operator data to Spring Boot.
Spring Boot saves the raw operator record in PostgreSQL with status PENDING.
Spring Boot pushes a generation job into a queue/worker system.
Worker reads the operator data and calls OpenRouter.
OpenRouter returns structured content JSON.
Worker validates the JSON and stores the generated content in PostgreSQL.
Status changes to READY or PUBLISHED.
Frontend fetches the content and renders it in the fixed UI sections.
If something fails, status becomes FAILED and only that case goes to retry or manual review.

This takes about 1–2 minutes, which is acceptable for the operator experience.

On duplicates and bots We’re aware that bots and duplicate/near-duplicate content are real risks at scale. We’re considering either:

Human-in-the-loop , or
Human-on-the-loop (post-generation review),

depending on feasibility. If we go with human-in-the-loop for profile creation, it could take 1–2 days per profile, which may cause operator dissatisfaction and negative rumors about the platform.

Our compromise is:

Generate the profile in 1–2 minutes automatically.
Later, after verification (e.g., within a week), add a verified tag to the profile.

This balances customer trust and operator experience.

On the generation layer Your suggestions about the generation pipeline are completely valid: risk of hallucinations, unsupported claims, and generic SEO filler are real concerns. Your point that “one API endpoint: input row → prompt → final paragraph → publish” is too fragile for scale is exactly what we’re trying to avoid.

We’re already considering:

Prompt optimization so the model performs accurately and stays within allowed facts.
A pipeline that owns normalization, fact restrictions, validation, duplicate checks, and publishing rules in the application layer, while the model mainly handles wording.
Structured outputs from OpenRouter, with validation before publishing.

Honestly, for the generation layer, I’m really glad for your help and guidance throughout this process. One last thing I wanted to ask — how would you personally approach building this system from scratch? Like, from which point would you start, and how would you structure the generation flow for scalability and SEO quality?

Discussion in the ATmosphere