{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiax2dlrtjjxbxvbdveh4yfcbnvvaqwn2qle3sfnvc5nfdooq5z2wa",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moprmvx7vaw2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreig3zsf37nlblgzruo7gqa5pites5vqfjbabeveuhezhjwqjclkkyi"
},
"mimeType": "image/webp",
"size": 371000
},
"path": "/kanishga_subramani_49ad73/day-27-of-100-days-of-clickhouser-optimizing-clickhouser-queries-for-faster-execution-5356",
"publishedAt": "2026-06-20T11:19:10.000Z",
"site": "https://dev.to",
"tags": [
"clickhouse",
"devops",
"analytics",
"database"
],
"textContent": "## Introduction\n\nAs datasets grow larger and analytical workloads become more demanding, query performance becomes increasingly important. While ClickHouse® is built for lightning-fast analytical processing, poorly optimized schemas and inefficient queries can still waste CPU cycles, consume excessive memory, and increase execution times.\n\nThe good news is that ClickHouse® provides numerous features to help you optimize both your data model and query patterns. From choosing the right `ORDER BY` key to using skip indexes, projections, materialized views, and efficient filtering techniques, a few design decisions can dramatically improve performance.\n\nIn this article, we'll explore practical techniques for optimizing ClickHouse® queries so your analytical workloads remain fast and scalable—even when working with billions of rows.\n\n# Why Query Optimization Matters\n\nClickHouse® is a column-oriented analytical database that processes data differently from traditional row-based databases.\n\nIt achieves exceptional performance through:\n\n * Columnar storage\n * Vectorized query execution\n * Parallel processing\n * Data compression\n * Sparse primary indexes\n * Intelligent data skipping\n\n\n\nHowever, these optimizations are only effective when tables are designed correctly and queries take advantage of them.\n\nPoorly optimized queries typically result in:\n\n * Increased query latency\n * Higher CPU utilization\n * Excessive memory consumption\n * Longer dashboard loading times\n * Reduced throughput\n * Higher infrastructure costs\n\n\n\nUnderstanding how ClickHouse® reads and skips data is the foundation of writing efficient queries.\n\n# 1. Choose a Good ORDER BY Key\n\nThe `ORDER BY` clause in MergeTree tables defines the physical sort order of data on disk.\n\nUnlike many databases, this is far more than just a sorting preference—it determines how efficiently ClickHouse® can locate data using its sparse primary index.\n\nA well-designed ORDER BY key allows ClickHouse® to skip large portions of data during query execution.\n\n\n\n CREATE TABLE events\n (\n user_id UInt32,\n event_type LowCardinality(String),\n timestamp DateTime,\n value Float64\n )\n ENGINE = MergeTree()\n ORDER BY (user_id, event_type, timestamp);\n\n\nNow queries filtering on these columns become extremely efficient.\n\n\n\n SELECT avg(value)\n FROM events\n WHERE user_id = 1001\n AND event_type = 'purchase'\n AND timestamp >= now() - INTERVAL 7 DAY;\n\n\n### Best Practices\n\n * Put frequently filtered columns first.\n * Place high-selectivity columns earlier.\n * Avoid random or highly unique first columns unless appropriate.\n * Design ORDER BY around your most common queries.\n\n\n\n# 2. Partition Large Tables\n\nPartitioning divides data into separate physical parts.\n\nWhen a query filters on the partition key, ClickHouse® skips entire partitions without reading them.\n\n\n\n CREATE TABLE events\n (\n user_id UInt32,\n event_type LowCardinality(String),\n timestamp DateTime,\n value Float64\n )\n ENGINE = MergeTree()\n PARTITION BY toYYYYMM(timestamp)\n ORDER BY (user_id, event_type, timestamp);\n\n\nQuery:\n\n\n\n SELECT count()\n FROM events\n WHERE timestamp >= now() - INTERVAL 30 DAY;\n\n\nInstead of scanning years of data, ClickHouse® only reads recent monthly partitions.\n\n### Common Partition Keys\n\n * `toYYYYMM(timestamp)`\n * `toYYYYMMDD(timestamp)`\n * Region\n * Tenant ID (for multi-tenant systems)\n\n\n\nAvoid over-partitioning, as too many small partitions can negatively impact performance.\n\n# 3. Avoid SELECT *\n\nClickHouse® only reads the columns referenced in your query.\n\nUsing `SELECT *` forces the database to read and decompress every column, increasing I/O and CPU usage.\n\nSlow:\n\n\n\n SELECT *\n FROM events\n WHERE user_id = 1001;\n\n\nBetter:\n\n\n\n SELECT\n user_id,\n event_type,\n value\n FROM events\n WHERE user_id = 1001;\n\n\nAlways retrieve only the columns your application actually needs.\n\n# 4. Use LowCardinality for String Columns\n\nColumns with relatively few unique values should use `LowCardinality`.\n\nExamples include:\n\n * status\n * country\n * region\n * event_type\n * device_type\n\n\n\nInstead of:\n\n\n\n event_type String\n\n\nUse:\n\n\n\n event_type LowCardinality(String)\n\n\nBenefits include:\n\n * Smaller storage\n * Better compression\n * Faster filtering\n * Faster GROUP BY\n * Reduced memory usage\n\n\n\nAs a rule of thumb, use `LowCardinality` for string columns with fewer than roughly 10,000 distinct values.\n\n# 5. Filter on ORDER BY Columns First\n\nClickHouse® can only use its sparse primary index efficiently when filtering on the leading columns of the ORDER BY key.\n\nGiven:\n\n\n\n ORDER BY (user_id, event_type, timestamp)\n\n\nFast:\n\n\n\n SELECT count()\n FROM events\n WHERE user_id = 1001\n AND event_type = 'purchase';\n\n\nSlow:\n\n\n\n SELECT count()\n FROM events\n WHERE value > 100;\n\n\nIf you frequently filter on non-indexed columns, consider adding a skip index.\n\n# 6. Use Skip Indexes\n\nSkip indexes help ClickHouse® skip granules when filtering on columns outside the primary key.\n\n\n\n ALTER TABLE events\n ADD INDEX idx_value value TYPE minmax GRANULARITY 4;\n\n\nMaterialize it:\n\n\n\n ALTER TABLE events\n MATERIALIZE INDEX idx_value;\n\n\nNow:\n\n\n\n SELECT count()\n FROM events\n WHERE value > 500;\n\n\nClickHouse® skips granules whose minimum and maximum values fall outside the requested range.\n\nCommon skip index types include:\n\n * minmax\n * set\n * bloom_filter\n * tokenbf_v1\n\n\n\n# 7. Avoid Functions on Filtered Columns\n\nApplying functions to indexed columns prevents ClickHouse® from using the primary index efficiently.\n\nAvoid:\n\n\n\n SELECT count()\n FROM events\n WHERE toYear(timestamp) = 2024;\n\n\nInstead use range filters:\n\n\n\n SELECT count()\n FROM events\n WHERE timestamp >= '2024-01-01'\n AND timestamp < '2025-01-01';\n\n\nRange conditions allow ClickHouse® to leverage index skipping effectively.\n\n# 8. Use PREWHERE\n\n`PREWHERE` is a ClickHouse® optimization that filters rows before loading all requested columns.\n\n\n\n SELECT\n user_id,\n value\n FROM events\n PREWHERE value > 100\n WHERE user_id = 1001;\n\n\nBenefits:\n\n * Reads fewer columns initially\n * Reduces disk I/O\n * Lowers memory consumption\n * Speeds up wide-table queries\n\n\n\nAlthough ClickHouse® automatically applies PREWHERE in many cases, specifying it manually gives you greater control.\n\n# 9. Use SAMPLE for Exploratory Queries\n\nFor approximate analytics, SAMPLE dramatically reduces execution time.\n\n\n\n SELECT avg(value)\n FROM events\n SAMPLE 0.1\n WHERE event_type = 'purchase';\n\n\nThis reads only 10% of the data while producing statistically representative results.\n\nRemember that SAMPLE requires a table created with a `SAMPLE BY` clause.\n\n# 10. Insert Data in Batches\n\nLarge batch inserts are far more efficient than numerous single-row inserts.\n\nSlow:\n\n\n\n INSERT INTO events VALUES\n (1001,'purchase',now(),99.99);\n\n INSERT INTO events VALUES\n (1002,'view',now(),0.0);\n\n\nBetter:\n\n\n\n INSERT INTO events VALUES\n (1001,'purchase',now(),99.99),\n (1002,'view',now(),0.0),\n (1003,'click',now(),15.20);\n\n\nBatch inserts reduce merge overhead and improve ingestion throughput.\n\n# 11. Avoid Expensive JOINs\n\nAlthough JOIN performance has improved significantly, joins remain expensive compared to scanning a single table.\n\nWhenever possible:\n\n * Denormalize frequently queried data.\n * Join only after filtering.\n * Join smaller tables to larger tables.\n * Consider Dictionaries for dimension lookups.\n\n\n\nInstead of:\n\n\n\n SELECT *\n FROM orders o\n JOIN customers c\n ON o.customer_id = c.id;\n\n\nFilter first:\n\n\n\n SELECT *\n FROM\n (\n SELECT *\n FROM orders\n WHERE order_date >= today() - 7\n ) o\n JOIN customers c\n ON o.customer_id = c.id;\n\n\nReducing rows before the JOIN often leads to substantial performance improvements.\n\n# 12. Choose the Right MergeTree Engine\n\nDifferent workloads benefit from different MergeTree engines.\n\nExamples include:\n\n * MergeTree → General-purpose analytics\n * ReplacingMergeTree → Deduplication\n * SummingMergeTree → Incremental aggregation\n * AggregatingMergeTree → Precomputed aggregates\n * CollapsingMergeTree → Event state transitions\n * VersionedCollapsingMergeTree → Slowly changing records\n\n\n\nChoosing the right engine can significantly reduce query complexity and execution time.\n\n# 13. Monitor Query Performance\n\nUse system tables to identify slow queries.\n\n\n\n SELECT\n query_duration_ms,\n read_rows,\n read_bytes,\n memory_usage\n FROM system.query_log\n ORDER BY event_time DESC\n LIMIT 20;\n\n\nMonitor:\n\n * Query duration\n * Rows read\n * Bytes read\n * Memory usage\n * Scan efficiency\n\n\n\nRegular monitoring helps detect regressions before they affect production workloads.\n\n# 14. Use LIMIT Efficiently\n\nIf users only require a subset of rows, don't scan the entire dataset.\n\n\n\n SELECT *\n FROM events\n ORDER BY timestamp DESC\n LIMIT 100;\n\n\nCombined with a suitable ORDER BY key, ClickHouse® can stop reading data earlier.\n\n# 15. Use Appropriate Data Types\n\nSmaller data types improve:\n\n * Compression\n * Cache efficiency\n * Memory utilization\n * Query speed\n\n\n\nInstead of:\n\n\n\n user_id UInt64\n\n\nUse:\n\n\n\n user_id UInt32\n\n\nif appropriate.\n\nOther recommendations:\n\n * UInt8 instead of UInt32 where applicable\n * Date instead of DateTime if time isn't needed\n * Enum or LowCardinality for categorical values\n\n\n\n# 16. Use Compression Codecs\n\nColumn-specific codecs can further reduce storage while improving read performance.\n\nExamples:\n\n\n\n temperature Float32 CODEC(Gorilla)\n\n\n\n timestamp DateTime CODEC(DoubleDelta)\n\n\n\n message String CODEC(ZSTD)\n\n\nChoosing the right codec depends on the data distribution and access patterns.\n\n# 17. Avoid Large OFFSET Pagination\n\nOFFSET pagination forces ClickHouse® to scan and discard rows.\n\nAvoid:\n\n\n\n SELECT *\n FROM events\n ORDER BY timestamp\n LIMIT 100 OFFSET 500000;\n\n\nPrefer keyset pagination:\n\n\n\n SELECT *\n FROM events\n WHERE timestamp > '2026-01-01 12:00:00'\n ORDER BY timestamp\n LIMIT 100;\n\n\nKeyset pagination scales much better for large datasets.\n\n# 18. Use Projections\n\nProjections store alternative sorted layouts inside the same table.\n\n\n\n ALTER TABLE events\n ADD PROJECTION by_event_type\n (\n SELECT *\n ORDER BY event_type\n );\n\n\nQueries filtering by `event_type` can automatically use the projection, reducing scan time without changing application queries.\n\n# 19. Use Materialized Views\n\nRepeatedly calculating the same aggregations is expensive.\n\nMaterialized Views compute aggregates during data insertion.\n\n\n\n CREATE MATERIALIZED VIEW events_mv\n TO events_hourly_summary\n AS\n SELECT\n toStartOfHour(timestamp) AS hour,\n event_type,\n count() AS total_count,\n sum(value) AS total_value\n FROM events\n GROUP BY hour, event_type;\n\n\nNow dashboards can query the summary table instead of billions of raw rows.\n\n\n\n SELECT\n hour,\n event_type,\n total_count,\n total_value\n FROM events_hourly_summary\n WHERE hour >= now() - INTERVAL 24 HOUR\n ORDER BY hour;\n\n\n# 20. Use EXPLAIN to Understand Query Execution\n\nThe `EXPLAIN` statement helps visualize how ClickHouse® executes queries.\n\nBasic example:\n\n\n\n EXPLAIN\n SELECT count()\n FROM events\n WHERE user_id = 1001;\n\n\nUseful options include:\n\n\n\n EXPLAIN PIPELINE\n SELECT count()\n FROM events;\n\n\n\n EXPLAIN indexes = 1\n SELECT count()\n FROM events\n WHERE user_id = 1001;\n\n\nThese commands reveal:\n\n * Query pipeline\n * Index usage\n * Estimated rows\n * Execution stages\n\n\n\nThey are invaluable when diagnosing slow queries.\n\n# Final Thoughts\n\nClickHouse® is already one of the fastest analytical databases available, but achieving consistent sub-second performance requires thoughtful schema design and efficient query patterns.\n\nStart with the fundamentals: choose an effective `ORDER BY` key, partition wisely, avoid unnecessary column reads, and write filters that leverage the primary index. As your datasets grow, take advantage of advanced features such as skip indexes, PREWHERE, projections, compression codecs, materialized views, and the EXPLAIN statement to fine-tune performance.\n\nBy combining these best practices, ClickHouse® can efficiently process billions of rows with minimal latency, making it an exceptional platform for modern analytical workloads.\n\nSmall optimizations compound over time. The more closely your table design aligns with your query patterns, the greater the performance gains you'll achieve",
"title": "Day 27 of 100 Days of ClickHouse® - Optimizing ClickHouse® Queries for Faster Execution"
}