{
  "$type": "site.standard.document",
  "content": {
    "$type": "site.standard.content.markdown",
    "text": "Thanks to `fsspec`, you can query arbitrary filesystems with DuckDB quite easily.\n\nTo do so, you need to register a `fsspec` filesystem on DuckDB. Since IPFS has a supported `fsspec` plugin, [`ipfsspec`](https://github.com/fsspec/ipfsspec), we can register it and start to query directly it with SQL.\n\nIf you want to follow along, you'll need to install `ipfsspec`,  `duckdb` and `fsspec`. You can do so with:\n\n```bash\npip install git+https://github.com/fsspec/ipfsspec duckdb fsspec\n```\n\nNow, let's register the IPFS filesystem on DuckDB:\n\n```python\nimport duckdb\nfrom ipfsspec import AsyncIPFSFileSystem\n\nipfs_fs = AsyncIPFSFileSystem()\n\nduckdb.register_filesystem(ipfs_fs)\n```\n\nOnce the filesystem is registered, you can use CIDs as URIs inside `read_csv_auto` or `read_parquet`!\n\nThe [`bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq` CID](https://bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq.ipfs.w3s.link/) is a CSV file. Querying it is as simple as:\n\n```python\n>> cid = 'bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq'\n>> duckdb.sql(f\"select * from read_csv_auto('ipfs://{cid}')\")\n┌────────┐\n│   c    │\n│ int64  │\n├────────┤\n│ 143732 │\n└────────┘\n```\n\nFor Parquet files, you can do the same with `read_parquet`:\n\n```python\n>> cid = 'bafkreibnx5q6qwxobozkdm6xt7ktvwciyfvtkgy7fud67w5oyxnf5tch4e'\n>> duckdb.sql(f\"select * from read_parquet('ipfs://{cid}')\")\n┌─────────────────────┬───────┬───────────────┐\n│       entity        │ year  │ literacy_rate │\n│       varchar       │ int32 │    double     │\n├─────────────────────┼───────┼───────────────┤\n│ Afghanistan         │  2000 │          28.1 │\n│ Albania             │  2011 │          96.8 │\n│ Algeria             │  2006 │          72.6 │\n│ American Samoa      │  1980 │          97.0 │\n│ Andorra             │  2011 │         100.0 │\n│ Angola              │  2011 │          70.4 │\n│ Anguilla            │  1984 │          95.0 │\n│ Antigua and Barbuda │  2011 │          99.0 │\n│ Argentina           │  2011 │          97.9 │\n│ Armenia             │  2011 │          99.6 │\n│    ·                │    ·  │            ·  │\n│    ·                │    ·  │            ·  │\n│    ·                │    ·  │            ·  │\n│ Uruguay             │  2010 │          98.1 │\n│ Uzbekistan          │  2011 │          99.4 │\n│ Vanuatu             │  2011 │          83.2 │\n│ Vatican             │  2011 │         100.0 │\n│ Venezuela           │  2009 │          95.5 │\n│ Vietnam             │  2011 │          93.4 │\n│ Wallis and Futuna   │  1969 │          50.0 │\n│ Yemen               │  2011 │          65.3 │\n│ Zambia              │  2007 │          61.4 │\n│ Zimbabwe            │  2011 │          83.6 │\n├─────────────────────┴───────┴───────────────┤\n│ 215 rows (20 shown)               3 columns │\n└─────────────────────────────────────────────┘\n```\n\nVoilà!",
    "version": "1.0"
  },
  "description": "Thanks to fsspec, you can query arbitrary filesystems with DuckDB quite easily. To do so, you need to register a fsspec filesystem on DuckDB. Since IPFS has a supported fsspec plugin, ipfsspec, we can register it and start to query directly it with SQL. If you want to follow a...",
  "path": "/duckdb-ipfs",
  "publishedAt": "2023-02-22T00:00:00.000Z",
  "site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
  "textContent": "Thanks to fsspec, you can query arbitrary filesystems with DuckDB quite easily.\n\nTo do so, you need to register a fsspec filesystem on DuckDB. Since IPFS has a supported fsspec plugin, ipfsspec, we can register it and start to query directly it with SQL.\n\nIf you want to follow along, you'll need to install ipfsspec,  duckdb and fsspec. You can do so with:\n\nNow, let's register the IPFS filesystem on DuckDB:\n\nOnce the filesystem is registered, you can use CIDs as URIs inside readcsvauto or readparquet!\n\nThe bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq CID is a CSV file. Querying it is as simple as:\n\nFor Parquet files, you can do the same with readparquet:\n\nVoilà!",
  "title": "DuckDB with IPFS CID's"
}