Need generative model, high-quality description generation
Thanks for the detailed guidance — this is very helpful. I agree with your main point: for our use case, the hard part is not finding one “best” model, but designing a reliable pipeline around structured operator data.
Our setup
Stack: PostgreSQL + Java (Spring Boot) + React/Node.js, hosted on AWS
We’re in production, and the app/web is running successfully. This is a stage-2 upgrade where we’re adding AI for content generation.
For cost reasons, we’ll likely start with an API-based approach (free tier or OpenRouter), not self-hosted models.
What we’re generating We have operator pages. The UI is fixed and shared across all operators. The problem is the content: bio, service areas, location, services offered, FAQ, etc. These must be:
Unique
Factually accurate (locked to operator inputs)
SEO-friendly
Scalable to 1,000+ existing operators and all future registrations
Our workflow (with REST API)
User clicks Create Profile in the frontend.
Frontend sends operator data to Spring Boot.
Spring Boot saves the raw operator record in PostgreSQL with status
PENDING.Spring Boot pushes a generation job into a queue/worker system.
Worker reads the operator data and calls OpenRouter.
OpenRouter returns structured content JSON.
Worker validates the JSON and stores the generated content in PostgreSQL.
Status changes to
READYorPUBLISHED.Frontend fetches the content and renders it in the fixed UI sections.
If something fails, status becomes
FAILEDand only that case goes to retry or manual review.
This takes about 1–2 minutes, which is acceptable for the operator experience.
On duplicates and bots We’re aware that bots and duplicate/near-duplicate content are real risks at scale. We’re considering either:
Human-in-the-loop , or
Human-on-the-loop (post-generation review),
depending on feasibility. If we go with human-in-the-loop for profile creation, it could take 1–2 days per profile, which may cause operator dissatisfaction and negative rumors about the platform.
Our compromise is:
Generate the profile in 1–2 minutes automatically.
Later, after verification (e.g., within a week), add a verified tag to the profile.
This balances customer trust and operator experience.
On the generation layer Your suggestions about the generation pipeline are completely valid: risk of hallucinations, unsupported claims, and generic SEO filler are real concerns. Your point that “one API endpoint: input row → prompt → final paragraph → publish” is too fragile for scale is exactly what we’re trying to avoid.
We’re already considering:
Prompt optimization so the model performs accurately and stays within allowed facts.
A pipeline that owns normalization, fact restrictions, validation, duplicate checks, and publishing rules in the application layer, while the model mainly handles wording.
Structured outputs from OpenRouter, with validation before publishing.
Honestly, for the generation layer, I’m really glad for your help and guidance throughout this process. One last thing I wanted to ask — how would you personally approach building this system from scratch? Like, from which point would you start, and how would you structure the generation flow for scalability and SEO quality?
Discussion in the ATmosphere