External Publication
Visit Post

Automatically Block AI Crawlers in Astro

~/.bnux January 29, 2026
Source
After migrating to Astro, one of the first things I wanted to port over was my automated crawler blocking setup. The Hugo version used to fetch a list of AI crawlers and generate at build time. With Astro, I figured I could pull from multiple sources: the Known Agents API (formerly Dark Visitors), Cloudflare's public bot list, and ai.robots.txt. As I mentioned in the Hugo version, has no legal or technical authority. You're trusting bots to respect rules with no mechanism to enforce them. But it can't hurt to try, and casting a wider net with multiple sources feels like a reasonable upgrade. Let's see how it works. The endpoint Astro's endpoints let you generate files dynamically from . In this site, runs as a server endpoint under the Netlify adapter, so I add cache headers to avoid fetching remote lists on every request. The Known Agents API requires a request with an access token and a list of agent types you want to block. Add your token to a file if you have one: The token is optional in my version. If it is missing, the endpoint skips Known Agents instead of sending . Create : The idea is simple: fetch the available lists, merge and deduplicate them, and return a formatted . If one source fails, the others can still work. If all remote sources fail, the endpoint falls back to a small built-in list so the response is still useful. The array controls which Known Agents categories get blocked. I'm targeting AI Data Scrapers, AI Agents, AI Assistants, and Undocumented AI Agents while leaving SEO crawlers and search engines alone. You can adjust those categories to match your own preferences. Take it for a test drive Start the dev server with and visit . You should see something like: The exact list will change over time because the upstream sources change. The response is cacheable for browsers and CDNs, so you get fresh-enough data without making every request wait on remote APIs. Tracking what's actually hitting your site Known Agents also offers a JavaScript analytics tag that tracks AI agent visits. It won't catch crawlers and scrapers (they don't run JavaScript), but it will show you visits from AI assistants and LLM-referred traffic. I added it to my base head: Between the robots.txt blocking and the analytics tag, you get a decent picture of what AI traffic looks like on your site. Going further For stronger enforcement, you could block crawlers at the server level too. Netlify supports an header via a file: I still don't trust that crawlers will respect any of this, but combining with server-level headers at least makes the intent clear. Further reading Known Agents Documentation Astro Endpoints Guide ai.robots.txt The text file that runs the internet

Discussion in the ATmosphere

Loading comments...