David Gasquez

Data @ Protocol Labs. Open Data, Open Source, Open Protocols. Walks taker. Progressive Metal enjoyer. davidgasquez.com

5290 followers1283 following53 stories

Longform Stories

Keep Your Slop to Yourself

You've probably heard this already but, I keep coming across this pattern and I wanted to add another post to the cause. Generating a wall of text is now free while reading, verifying, and distilling …

May 23·2 min read·396 words

AT Protocol for Agents

My atmosphere data lives at at://davidgasquez.com. You can explore it all without API keys, auth, or arbitrary HTML to parse. Your agent can browse it, query it, and link to it. Getting someone latest…

May 15·6 min read·1186 words

Growing My Own Data Platform

It has been a couple of months since I wrote about Barefoot Data Platforms. Since then, I've been applying those ideas to a real data platform I've been rebuilding from scratch. I've learned a lot abo…

Apr 22·4 min read·745 words

Indexing and Sharing Organizational Context with qmd

Data engineering is mostly context gathering. Tons of moments of going though code, docs, specs, issues, conversations to figure out how stakeholders want active users to be counted. qmd turns that pi…

Apr 17·3 min read·461 words

Atmospheric Data Portals

The missing open data social layer

Mar 26·2 min read·336 words

Atmospheric Data Portals

I wrote this quickly more as a sketch for people already familiar with atproto and the open data discovery challenges than a fully self-contained post. This post can be seen as a continuation of my Op…

Mar 26·4 min read·739 words

Reliable unreliability

Agentic engineering is mostly about building reliable systems around unreliable components (like your friendly coding agent). A good analogy I like is how early computers were powerful, but not trustw…

Mar 24·3 min read·514 words

Public Goods Funding Needs Evals

There are many interesting funding experiments happening these days: RetroPGF, ProPGF, quadratic rounds, expert juries, ML competitions, prediction markets, and anything in between. Experimentation is…

Mar 20·4 min read·609 words

PGF Arena

I've been building something to test the idea of a "Public Goods Funding Arena", a tiny app for communities to compare funding allocations with simple pairwise choices. The core approach I took is bor…

Mar 17·2 min read·367 words

Collaborative Dependency Graphs

TLDR: We do not mostly need better funding allocations for on public goods funding but better mechanisms for discussing and coordinating what to propose and how we get to consensus in a reliable way. …

Mar 12·7 min read·1225 words

Specializing Pi

I've written about specializing Codex and Claude Code in the past. Here is how to do something similar for Pi! I made a small profile extension so I can launch somewhat contained agents with pi --prof…

Mar 11·2 min read·213 words

Eliciting useful datasets

Open datasets are everywhere. Maintained datasets are rare. I keep seeing the same pattern in open data ecosystems. A few folks do expensive curation work, the rest of us free-ride, and eventually the…

Feb 11·4 min read·628 words

Barefoot Data Platforms

I've been maintaining an open source and local-first data platform for the Filecoin ecosystem while keeping some fun constraints and principles. I previously wrote about the pattern, related ideas, an…

Feb 2·4 min read·730 words

Useful Agentic Workflows

2025 Edition Last year, I shared some LLM workflows I was finding useful. Since then my workflows have changed and I now build many things I would not have attempted before. Here are some agentic work…

Dec 13·7 min read·1326 words

Specializing Codex

I've been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I'm using Codex as my coding agent. Similar to what I did with Claude, I've been "fine-tuning" C…

Oct 22·3 min read·599 words

Weight Allocation Mechanism Evals

There are many folks working on mechanisms that can be boiled down to "assign weights to items (projects, grant proposals, movies, etc) indicating their relative importance in a credibly neutral manne…

Oct 15·7 min read·1368 words

Ranking with Agents

TL;DR: This post explores using a multi-agent systems for ranking tasks. Check out Arbitron if you want to see a working implementation of the pattern/ideas. One of the latest Kaggle style competition…

Aug 6·10 min read·1877 words

Scrappy Data Cleaning

While working on a small analysis over highly unstructed data at work, I came up with a very hacky but effective and cheap way to clean and process messy data. There is almost no code involved. The ma…

Aug 5·2 min read·333 words

Exporting AEMET Datasets

In the same vein as my previous little battles exporting INE's datasets, I've spent some time this weekend exploring and trying to export some AEMET (Spanish State Meteorological Agency) datasets. I w…

Jul 18·5 min read·922 words

Credible Neutral AI Competitions

After writing Steering AIs with Transparent and Effective ML Competitions, I've been thinking about potential ways to make running similar competitions more credibly neutral and community friendly. Si…

Jun 3·7 min read·1322 words

LLM Friendly Projects

Everyone who is using LLMs and "Agents" (a.k.a. LLMs using tools in a loop) to code is trying to figure out what works and what doesn't. This is far from trivial given the stochastic nature of these c…

May 24·6 min read·1072 words

Transparent and Effective ML Competitions

I've been spending some time thinking about ideas that Vitalik shared about having "AI as the engine, humans as the steering wheel" and participating in some of the experiments they are doing around t…

May 10·6 min read·1173 words

Apache ECharts in Astro

Similar to Observable Plot in Astro, I wanted to explore integrating Apache ECharts with Astro. ECharts is a powerful and loved charting library that offers a wide range of visualization options with …

Apr 9·2 min read·207 words

Exporting INE Datasets - Part 2

A few months ago, I found myself downloading and converting all Spain's National Institute of Statistics datasets to Parquet with the hope of making them easier to access and faster to use. Basically,…

Mar 19·7 min read·1209 words

Trying Zed Editor

I used to spend afternoons testing new distros and configuring them. I jumped to the next one as soon as I got bored, until I settled on Arch Linux, btw. These days, I'm doing the same thing with LLMs…

Feb 15·3 min read·571 words

Teaching Cursor Agent Tricks

I've been using Cursor during the last few months for both coding and note-taking. They recently introduced a feature that makes their Composer model able to use an agent. That means it can write code…

Jan 8·2 min read·372 words

DuckDB in Astro

Wanted to play with the new DuckDB Node Neo library and thought it would be interesting to see if I could make it work in Astro. Astro supports MDX, so I thought it would be a good fit. The query will…

Jan 3·2 min read·288 words

Debugging Dagster in VSCode

I haven't used a debugger probably since my first year of college working with C++. I've been coding in Notebook heavy environments since I switched to Python. To be honest, I'm not sure how to even d…

Dec 27·3 min read·532 words

LLMing for free

You can use the awesome llm tool for free with one of the best models available today! In this small post, I'll show you how to do it in a few easy steps. We'll setup llm with the new Gemini models fr…

Dec 20·2 min read·347 words

Useful LLM workflows (2024 Edition)

Since ChatGPT came out, I've been relying on LLMs for a lot of things. They've made me more productive and saved me a lot of time. More importantly, they've reduced the friction for doing interesting …

Dec 13·3 min read·457 words

Taking notes with IDEs

Back in 2016, I started building a "Personal Knowledge Base" / Digital Garden as a way to capture my learnings. At the time, I was using Atom and kept my notes in Markdown files in a GitHub repository…

Nov 28·3 min read·465 words

Exploring AT Protocol with Python

In the last few weeks, there has been a lot of activity on Bluesky. Bluesky is a social network built on open standards. Specifically, it is built on top of the AT Protocol. Most (if not all) data is …

Nov 14·3 min read·532 words

Async Batch Requests in Python

As a data engineer, one of the most common tasks I perform is getting data from an API. For a long time, I've been using the requests library to make these requests. However, I recently discovered the…

Nov 12·2 min read·392 words

Exporting INE Datasets

A couple of weeks ago, my partner wanted to get some data from the official Spain statistics institute, INE (Instituto Nacional de Estadística de España). The process was not straightforward and I tho…

Oct 20·4 min read·622 words

Community Level Open Data Infrastructure

Inspired by Rufus Pollock, I decided to write a post about learnings and what I think will be the future of open data infrastructure, more specifically open data infrastructure at the community level.…

Aug 10·5 min read·913 words

Observable Plot in Astro

I've been playing with Observable Framework and Evidence for a while. One thing that I keep wishing is for those data focused static site generators to have a way to render everything at build time an…

Jul 6·2 min read·201 words

DuckDB and BigQuery Storage API

BigQuery has a not so well known API, Storage API, that let's you grab a result set or table as Arrow datasets. It is cheaper than the standard query costs and integrates with all the rest of the Arro…

Apr 5·1 min read·125 words

Working with LLMs on AMDGPUs

This might only work for a few months (or even days), but after spending a few hours trying to get an open source LLMs to work on AMDGPUs inside Docker, I thought I'd share my findings. My GPU is an A…

Nov 2·2 min read·308 words

Downloading the Watch Later YouTube Playlist

I sort of knew about yt-dlp, a fork of youtube-dl with some extra features, but I never really used it until recently. The goal I was trying to archieve was to download all the videos I've been storin…

Oct 1·2 min read·254 words

Gitcoin Data Portal

Last week, I went on a rabbit hole after coming across RegenData.xyz, an initiative to collect and surface grants data. I wanted to explore the idea of a fully local and open data portal. If you don't…

Sep 11·5 min read·923 words

Downloading HTTP Folders Fast

I've been looking for ways to download a bunch of files recursively via HTTP. The first thing I tried, wget, worked reasonably well. You can do something similar to the following snippet to get a copy…

Sep 10·1 min read·187 words

Using Pyscript

Turns out you can run Python in the browser thanks to WASM and, specifically, PyScript. Let's try it out and integrate it with Hugo! So... this is the current date and time, computed by Python running…

Aug 8·2 min read·221 words

Static notebooks with Quarto

I've recently found Quarto, a tool makes it easy to publish Jupyter notebooks as static websites. It is probably also the one that produces the most beautiful content without much effort. For Python f…

May 29·1 min read·170 words

Wikidata with SPARQL and ChatGPT

I've never enjoyed SPARQL syntax and found it confussing coming from SQL. That has made me not use or explore Wikidata as much as I wanted. This, however, seems to have changed after being able to use…

Apr 18·2 min read·243 words

On Blockchain Data Pipelines

I've spent the last few months working on indexing and building data pipelines for the Filecoin blockchain. While it's been a great and exciting learning experience, I've realized the space can learn …

Apr 7·3 min read·553 words

Thoughts on the Frictionless Standard

So far I've enjoyed a lot playing with Frictionless and learning more about the internal design and specs. Pros The minimal implementation (simple JSON) is super lightweight and flexible enought! If t…

Mar 28·2 min read·400 words

DuckDB with IPFS CID's

Thanks to fsspec, you can query arbitrary filesystems with DuckDB quite easily. To do so, you need to register a fsspec filesystem on DuckDB. Since IPFS has a supported fsspec plugin, ipfsspec, we can…

Feb 22·1 min read·164 words

From Google Colab to GitHub Codespaces

For the last few years, I've been a pretty heavy user of Google Collab. Every time I wanted to play with a new package or idea, I created a new Jupyter Notebook. Recently though, I've become a big fan…

Jan 15·1 min read·195 words

My little and short experience with Nix

NixOS has somehow slowly gotten into all my feeds. So, after getting a new AMD GPU (Radeon 7900 XTX), I took the opportunity to install NixOS on my main desktop PC. This blog is to share how it played…

Dec 28·3 min read·582 words

Sharing your SSH keys

Today I learned that you can easily share your public SSH keys using GitHub by adding .keys after your user. You can check my keys at https://github.com/davidgasquez.keys! Cheers!

Sep 7·1 min read·60 words

Kaplan-Meier Survival Curves in dbt

Inspired by Convoys, I've tried to model conversion rates in SQL. The following macro computes the Kaplan-Meier survival curves and conversion rates given two timestamps and the groups. You can then g…

Feb 14·1 min read·142 words

MongoDB Change Streams to BigQuery

Before jumping into the technical details, it’s good to review why we decided to build this pipeline. We had two main reasons to develop it: Querying MongoDB for analytics is not efficient at a certai…

Jan 3·5 min read·898 words

Building a Personal Knowledge Base

For about 2 years now, I've been maintaining a Personal Handbook. In this collection of documents I add everything that I find worth remembering. It contains multiple topics. Some of them share my per…

Nov 7·2 min read·355 words