External Publication

How I Built a Personal AI Knowledge Base with Amazon Aurora pgvector and Next.js — AWS H0 Hackathon

DEV Community [Unofficial] June 26, 2026

I built ChatScroll for the AWS H0 Hackathon — an app that lets you save AI answers as searchable "Scrolls" using Amazon Aurora PostgreSQL with pgvector for semantic search.

The Problem

Every day people ask AI assistants valuable questions and get great answers — then lose them forever. Chat history is linear, unsearchable, and ephemeral. I kept re-Googling the same questions knowing I had already found the answer somewhere but couldn't find it again.

The Solution

ChatScroll transforms AI conversations into a personal knowledge library. Save any AI answer as a "Scroll", organize it automatically, and find it later with semantic search.

The Core Technical Challenge

Making search understand MEANING not just keywords. When you search "blood thinner medication" it should find your warfarin scroll even though "blood thinner" doesn't appear in the title.

How pgvector on Aurora Solves This

Amazon Aurora PostgreSQL with the pgvector extension stores 3072-dimensional vector embeddings for every saved Scroll.

When a user saves a Scroll:

The answer text is sent to Google's gemini-embedding-001
The model returns a 3072-dimensional vector
The vector is stored in Aurora alongside the content

When a user searches:

The search query is converted to a vector
Aurora finds the most similar vectors using cosine distance
Results are ranked by semantic similarity

-- Semantic search with threshold
WHERE 1 - (embedding <=> $queryVec) > 0.5
ORDER BY embedding <=> $queryVec
LIMIT 5

Three PostgreSQL Extensions Working Together

What makes Aurora special for this use case is three extensions working together:

pgvector — stores 3072-dim embeddings, enables cosine similarity search between vectors

ltree — stores folder paths as dot-separated label trees (programming.containers), enables subtree queries without recursive CTEs

tsvector — powers full-text search with ranking via ts_rank, combined with pgvector for hybrid search

The Dual Database Architecture

I made a deliberate choice to use TWO AWS databases:

Amazon Aurora PostgreSQL for structured data:

Scrolls with embeddings
Folder hierarchy (ltree)
User accounts (Cognito sub)
Conversation metadata

Amazon DynamoDB for chat messages:

PK: conversationId
SK: timestamp#messageId
TTL: 90-day auto-expiry
PAY_PER_REQUEST billing

This separation keeps Aurora lean for complex queries while DynamoDB handles the high-volume chat stream.

The Result

Searching "containerization technology" correctly surfaces the Docker scroll. Searching "blood thinner medication" finds warfarin — no programming results contaminating it.

Semantic search scoped to the same folder category ensures results are always relevant.

Try It

Live app: https://chatscroll.vercel.app AWS Architecture: https://chatscroll.vercel.app/aws-showcase

I created this content for the purposes of entering the AWS H0 Hackathon.

H0Hackathon #AWS #Aurora #pgvector #Vercel #NextJS #H0Hackathon