{
"$type": "site.standard.document",
"content": {
"$type": "site.standard.content.markdown",
"text": "BigQuery has a not so well known API, [Storage API](https://cloud.google.com/bigquery/pricing#storage), that let's you grab a result set or table as Arrow datasets. It is cheaper than the standard query costs and integrates with all the rest of the Arrow ecosystem, like DuckDB.\n\nYou can test it out with this code:\n\n```python\nimport duckdb\nfrom google.cloud import bigquery\n\nbqclient = bigquery.Client()\n\ntable = bigquery.TableReference.from_string(\n \"bigquery-public-data.samples.shakespeare\"\n)\n\nrows = bqclient.list_rows(table)\n\nshakespeare = rows.to_arrow(create_bqstorage_client=True)\nconn = duckdb.connect(\":memory:\")\n\nconn.sql(\"\"\"\n select\n word,\n sum(word_count)\n from shakespeare\n group by 1\n order by 2 desc\n limit 10\n\"\"\")\n\n```\n\nGives you a result like this:\n\n```markdown\n┌─────────┬─────────────────┐\n│ word │ sum(word_count) │\n│ varchar │ int128 │\n├─────────┼─────────────────┤\n│ the │ 25568 │\n│ I │ 21028 │\n│ and │ 19649 │\n│ to │ 17361 │\n│ of │ 16438 │\n│ a │ 13409 │\n│ you │ 12527 │\n│ my │ 11291 │\n│ in │ 10589 │\n│ is │ 8735 │\n├─────────┴─────────────────┤\n│ 10 rows 2 columns │\n└───────────────────────────┘\n```\n\nThe code is also available as a [Google Colab Notebook](https://colab.research.google.com/drive/1tEHP3Gdyfu8DVftyoACkLp7cIA6heEHr#scrollTo=FdYNGoTFuKIL)!",
"version": "1.0"
},
"description": "BigQuery has a not so well known API, Storage API, that let's you grab a result set or table as Arrow datasets. It is cheaper than the standard query costs and integrates with all the rest of the Arrow ecosystem, like DuckDB. You can test it out with this code: Gives you a res...",
"path": "/duckdb-bq-storage",
"publishedAt": "2024-04-05T00:00:00.000Z",
"site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
"textContent": "BigQuery has a not so well known API, Storage API, that let's you grab a result set or table as Arrow datasets. It is cheaper than the standard query costs and integrates with all the rest of the Arrow ecosystem, like DuckDB.\n\nYou can test it out with this code:\n\nGives you a result like this:\n\nThe code is also available as a Google Colab Notebook!",
"title": "DuckDB and BigQuery Storage API"
}