Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic6ovwvhqqyrq34uhbcfmud367zax7jxsegryu7xiwh26qhwmsfti",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mpdilhc6sfc2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiamyiexes2saquxmjmr7gy5ev7mt2id2a2qnhg33omvkfqzlifmni"
    },
    "mimeType": "image/webp",
    "size": 67948
  },
  "path": "/anshul_02/delta-tables-in-microsoft-fabric-what-they-are-and-how-theyre-structured-635",
  "publishedAt": "2026-06-28T07:12:16.000Z",
  "site": "https://dev.to",
  "tags": [
    "database",
    "dataengineering",
    "microsoft",
    "opensource",
    "@onelake.dfs.fabric.microsoft.com"
  ],
  "textContent": "If you've worked with the Microsoft Fabric Lakehouse, you've probably noticed that all your managed tables are stored as Delta tables. But what exactly is a Delta table? What does it look like on disk? And why does Fabric use it as the default format?\n\nThis blog answers all of that simply and clearly.\n\n##  What Is a Delta Table?\n\nA Delta table is a table stored in the **Delta Lake** open-source format. It's built on top of regular Parquet files (a popular columnar file format) — but with a key addition: a **transaction log** that tracks every change made to the table.\n\nThat transaction log is what makes Delta tables different from plain files. It gives you:\n\n  * **ACID transactions** — Reads and writes are reliable. No partial writes, no corrupt data.\n  * **Time travel** — Query the table as it looked yesterday, last week, or any past version.\n  * **Schema enforcement** — Delta rejects data that doesn't match the table's schema.\n  * **Efficient updates and deletes** — You can actually UPDATE or DELETE rows, which you can't do with plain Parquet files.\n\n\n\n##  Where Do Delta Tables Live in the Fabric Lakehouse?\n\nIn Microsoft Fabric, your Lakehouse is connected to **OneLake** — Fabric's unified storage layer. OneLake uses an ADLS Gen2-compatible folder structure under the hood.\n\nEvery Lakehouse has two sections:\n\nSection | What it is\n---|---\n**Tables** | Managed Delta tables. Schema tracked by Fabric. Appear in the SQL endpoint automatically.\n**Files** | Raw files (CSV, JSON, Parquet, etc.) you manage yourself. Not automatically queryable as tables.\n\nWhen you create a Delta table in the Lakehouse (either via Spark, a pipeline, or a Dataflow), it gets stored inside the **Tables** folder.\n\n##  The Folder Structure of a Delta Table\n\nThis is the most important part. Let's say you create a table called `sales`. Here's what the folder structure looks like in OneLake:\n\n\n\n    Lakehouse/\n    └── Tables/\n        └── sales/\n            ├── _delta_log/\n            │   ├── 00000000000000000000.json\n            │   ├── 00000000000000000001.json\n            │   ├── 00000000000000000002.json\n            │   └── ... (one file per transaction)\n            ├── part-00000-<uuid>.snappy.parquet\n            ├── part-00001-<uuid>.snappy.parquet\n            └── part-00002-<uuid>.snappy.parquet\n\n\nLet's walk through each part.\n\n###  The Parquet Files — Your Actual Data\n\nThe files named `part-00000-....snappy.parquet` are where your data lives. Each file is a **Parquet file** — a compressed, columnar binary format optimized for analytical queries.\n\nA few things to know:\n\n  * There can be many Parquet files per table, depending on how many Spark partitions were used when writing.\n  * Each file is self-contained. You can read it independently.\n  * They are compressed (usually Snappy or ZSTD), so they're much smaller than equivalent CSV files.\n  * They are **columnar** — meaning if you query only the `revenue` column, only that column's data is read from disk. This makes analytical queries very fast.\n\n\n\nWhen you have a large table, there could be hundreds of these Parquet files. Spark reads them all in parallel.\n\n###  The `_delta_log` Folder — The Transaction Log\n\nThis is the heart of Delta Lake. The `_delta_log` folder contains a series of JSON files, one per transaction (or commit).\n\nEvery time something changes in the table — an INSERT, a DELETE, an UPDATE, a schema change — Delta writes a new JSON file to `_delta_log` with a description of what happened.\n\nHere's what a simple log entry (simplified) looks like:\n\n\n\n    {\n      \"add\": {\n        \"path\": \"part-00000-abc123.snappy.parquet\",\n        \"size\": 1048576,\n        \"stats\": \"{\\\"numRecords\\\": 50000, \\\"minValues\\\": {\\\"date\\\": \\\"2024-01-01\\\"}, \\\"maxValues\\\": {\\\"date\\\": \\\"2024-03-31\\\"}}\"\n      }\n    }\n\n\nAnd when a file is removed (after an UPDATE or DELETE):\n\n\n\n    {\n      \"remove\": {\n        \"path\": \"part-00000-abc123.snappy.parquet\",\n        \"deletionTimestamp\": 1710000000000\n      }\n    }\n\n\nThe log is append-only. Nothing is deleted from it. This is how Delta supports time travel — you can replay the log up to any version to reconstruct the table at that point in time.\n\n###  Checkpoints: Keeping the Log Fast\n\nAs the log grows (thousands of transactions), reading all those JSON files to figure out the current state of the table gets slow. Delta solves this with **checkpoints**.\n\nEvery 10 commits (by default), Delta writes a checkpoint file in Parquet format that summarizes the full state of the table at that point. Future reads only need to read the latest checkpoint + any newer log files after it.\n\n\n\n    _delta_log/\n    ├── 00000000000000000000.json\n    ├── ...\n    ├── 00000000000000000010.json\n    ├── 00000000000000000010.checkpoint.parquet   ← checkpoint\n    ├── 00000000000000000011.json\n    ├── 00000000000000000012.json\n    └── ...\n\n\nYou'll see these checkpoint files appear naturally in your Lakehouse as tables get updated over time.\n\n##  Partitioned Delta Tables\n\nFor large tables, you'll typically partition your data — split the files into subfolders based on a column value. For example, partitioning a sales table by year and month looks like this:\n\n\n\n    Tables/\n    └── sales/\n        ├── _delta_log/\n        ├── year=2023/\n        │   ├── month=01/\n        │   │   └── part-00000-<uuid>.snappy.parquet\n        │   └── month=02/\n        │       └── part-00000-<uuid>.snappy.parquet\n        └── year=2024/\n            ├── month=01/\n            │   └── part-00000-<uuid>.snappy.parquet\n            └── month=02/\n                └── part-00000-<uuid>.snappy.parquet\n\n\nPartitioning is a performance optimization. If you query `WHERE year = 2024 AND month = 01`, Spark only reads the files in that one subfolder — skipping everything else. For tables with years of data, this makes an enormous difference.\n\n##  How Fabric Uses This Structure\n\nIn Microsoft Fabric:\n\n  * **The Lakehouse UI** reads the `_delta_log` to show you table metadata, column names, row counts, and table history. This all comes from the transaction log.\n  * **The SQL Analytics Endpoint** is automatically built on top of your Delta tables. Fabric reads the Delta log to register the tables and their schemas, making them instantly queryable with T-SQL.\n  * **Power BI Direct Lake mode** reads the Parquet files directly using V-Order optimization (Fabric writes Parquet files in a special ordered format), bypassing the need to import or cache data. This is why Direct Lake is faster than Import mode.\n  * **Time travel** works out of the box. You can run `SELECT * FROM sales VERSION AS OF 5` in Spark SQL to see the table as it was at version 5.\n\n\n\n##  A Quick Example: Inspecting Your Delta Table\n\nIn a Fabric notebook, you can inspect the table history and files easily:\n\n\n\n    # View the history of all changes made to the table\n    display(spark.sql(\"DESCRIBE HISTORY sales\"))\n\n    # View the individual files that make up the table right now\n    display(spark.sql(\"DESCRIBE DETAIL sales\"))\n\n    # Time travel: query the table as it was at version 2\n    display(spark.sql(\"SELECT * FROM sales VERSION AS OF 2\"))\n\n\nYou can also read the `_delta_log` files directly if you're curious:\n\n\n\n    log = spark.read.json(\"abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/Tables/sales/_delta_log/*.json\")\n    display(log)\n\n\n##  Summary: What a Delta Table Really Is\n\nComponent | What it does\n---|---\n**Parquet files** | Store the actual data, compressed and columnar\n**`_delta_log/` JSON files** | Record every transaction — adds, removes, schema changes\n**Checkpoint files** | Summarize table state every 10 commits for fast reads\n**Partition folders** | Optional subfolders by column value for query performance\n\nA Delta table is not magic — it's Parquet files you can already read, plus a log folder that makes those files transactional, versioned, and reliable.\n\nMicrosoft Fabric builds everything on top of this structure: the Lakehouse SQL endpoint, Direct Lake Power BI, time travel, and ACID-safe pipelines. Understanding the folder structure helps you reason about how your data is stored, why queries perform the way they do, and how to troubleshoot when something looks off.",
  "title": "Delta Tables in Microsoft Fabric: What They Are and How They're Structured"
}