{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreid5m6qbf65fwlqgd2h2qzgrsl2ej47pxhxcdnopcukuveh4bug2oi",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3movgqkvybgg2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreief6yw4xmxe6nwqqv4qxpe3y72ihgpupcczdtz6swguirpkhzyhsu"
},
"mimeType": "image/webp",
"size": 408272
},
"path": "/hasan_dev/how-do-you-know-you-need-a-database-index-4c1h",
"publishedAt": "2026-06-22T17:04:48.000Z",
"site": "https://dev.to",
"tags": [
"rails",
"performance",
"interview",
"database"
],
"textContent": "I got asked this in an interview years ago, and I've asked it from the other side of the table since. I like it because the lazy answer (\"index everything\") is wrong, and the real skill is knowing where to look before you touch anything. So here's the whole loop: how I spot a column that needs an index, how I prove the index actually helped, and what it costs me to add one.\n\n## Which columns actually need one\n\nThe candidates are columns you filter, sort, or join on. In SQL terms, anything in a WHERE, ORDER BY, JOIN, or GROUP BY on a table that's big or growing.\n\nA few I always check first:\n\nForeign keys, like `user_id` and `order_id`. In Rails these don't get an index automatically when you add the reference unless you ask for one, and they get hammered by association lookups and joins. This is the missing index I find most often.\n\nLookup columns like `email`, `slug`, and `token`. These usually want a unique index anyway.\n\nFilter and sort columns like `status` and `created_at`, the stuff your dashboards page over.\n\nNone of it matters at small scale. A sequential scan over 500 rows is instant. The pain shows up at a few million rows, when an endpoint that felt fine in development starts timing out in production.\n\n## How I measure whether it helped\n\nMostly EXPLAIN ANALYZE.\n\n\n\n EXPLAIN ANALYZE\n SELECT * FROM orders WHERE user_id = 42 AND status = 'paid';\n\n\nHere's what I read in the output. A `Seq Scan` on a big table is the red flag: Postgres is reading every row to answer the query. After I add the index, I want to see that become an `Index Scan` or a `Bitmap Index Scan`.\n\nI compare the actual time before and after, not just the plan shape. I also check estimated rows against actual rows. When those are far apart, the planner is running on stale statistics, and a quick `ANALYZE` sometimes fixes the query with no index at all.\n\nThe local plan only goes so far. To find what's worth fixing, I look at production: `pg_stat_statements` for the genuinely expensive queries, plus whatever APM is running (New Relic, Skylight, Datadog) and the query timings in the Rails log. It's easy to lovingly optimize a query that runs twice a day while ignoring the one that runs ten thousand times an hour.\n\n## Indexes aren't free\n\nThis is the part people skip. Every index costs disk space, and it slows down writes, because every INSERT, UPDATE, and DELETE has to keep the index current. On a write-heavy table that adds up fast.\n\nSo I don't index on a hunch. I index columns I can show are being queried, and I drop indexes nobody uses. An unused index is pure cost. It slows your writes and gives you nothing back.\n\n## The practical one: `Order.where(user_id: id, status: \"paid\")`\n\nI'd add a composite index on `[:user_id, :status]`:\n\n\n\n add_index :orders, [:user_id, :status]\n\n\nColumn order is the whole game here. `user_id` goes first because it's the selective, always-present equality filter. The B-tree narrows to one user's orders, then `status` filters within that small set.\n\nThere's a bonus: because `user_id` is the leftmost column, this same index also covers queries that filter on `user_id` alone, so you don't need a second index just for it.\n\nReversing it to `[:status, :user_id]` would be worse. `status` has only a handful of distinct values, so leading with it barely narrows the search before it gets to `user_id`.",
"title": "How do you know you need a database index?"
}