How I Built a Personal AI Knowledge Base with Amazon Aurora pgvector and Next.js — AWS H0 Hackathon
I built ChatScroll for the AWS H0 Hackathon — an app that lets you save AI answers as searchable "Scrolls" using Amazon Aurora PostgreSQL with pgvector for semantic search.
The Problem
Every day people ask AI assistants valuable questions and get great answers — then lose them forever. Chat history is linear, unsearchable, and ephemeral. I kept re-Googling the same questions knowing I had already found the answer somewhere but couldn't find it again.
The Solution
ChatScroll transforms AI conversations into a personal knowledge library. Save any AI answer as a "Scroll", organize it automatically, and find it later with semantic search.
The Core Technical Challenge
Making search understand MEANING not just keywords. When you search "blood thinner medication" it should find your warfarin scroll even though "blood thinner" doesn't appear in the title.
How pgvector on Aurora Solves This
Amazon Aurora PostgreSQL with the pgvector extension stores 3072-dimensional vector embeddings for every saved Scroll.
When a user saves a Scroll:
- The answer text is sent to Google's gemini-embedding-001
- The model returns a 3072-dimensional vector
- The vector is stored in Aurora alongside the content
When a user searches:
- The search query is converted to a vector
- Aurora finds the most similar vectors using cosine distance
- Results are ranked by semantic similarity
-- Semantic search with threshold
WHERE 1 - (embedding <=> $queryVec) > 0.5
ORDER BY embedding <=> $queryVec
LIMIT 5
Three PostgreSQL Extensions Working Together
What makes Aurora special for this use case is three extensions working together:
pgvector — stores 3072-dim embeddings, enables cosine similarity search between vectors
ltree — stores folder paths as dot-separated label trees
(programming.containers), enables subtree queries without
recursive CTEs
tsvector — powers full-text search with ranking via ts_rank, combined with pgvector for hybrid search
The Dual Database Architecture
I made a deliberate choice to use TWO AWS databases:
Amazon Aurora PostgreSQL for structured data:
- Scrolls with embeddings
- Folder hierarchy (ltree)
- User accounts (Cognito sub)
- Conversation metadata
Amazon DynamoDB for chat messages:
- PK: conversationId
- SK: timestamp#messageId
- TTL: 90-day auto-expiry
- PAY_PER_REQUEST billing
This separation keeps Aurora lean for complex queries while DynamoDB handles the high-volume chat stream.
The Result
Searching "containerization technology" correctly surfaces the Docker scroll. Searching "blood thinner medication" finds warfarin — no programming results contaminating it.
Semantic search scoped to the same folder category ensures results are always relevant.
Try It
Live app: https://chatscroll.vercel.app AWS Architecture: https://chatscroll.vercel.app/aws-showcase
I created this content for the purposes of entering the AWS H0 Hackathon.
Discussion in the ATmosphere