Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibnavzwksiqtcxkn4zkvmzolf5hse3ughou2d2frqr74a6slsdv3a",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mohzd4ulq4y2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreihc7p26wiaqqvdvvphhz3f4irxn3v4sz6x4rwxu3tvxbajn5nm37q"
    },
    "mimeType": "image/webp",
    "size": 67202
  },
  "path": "/kathan555/ai-bots-are-reading-your-site-heres-how-to-make-them-sell-you-2ode",
  "publishedAt": "2026-06-17T09:02:04.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "seo",
    "webdev",
    "career",
    "hire page →",
    "@context",
    "@type"
  ],
  "textContent": "I was going through my server logs last month when I noticed something I'd been scrolling past for weeks. Buried in the bot traffic were names I vaguely recognised: `GPTBot`. `ClaudeBot`. `meta-externalagent`. `PerplexityBot`. Multiple visits daily, methodically working through different pages of my technical blog.\n\nThe reflex most developers have at this point including me, initially is to block them. There's an entire category of articles recommending exactly that: add a few directives to `robots.txt`, protect your content from being consumed by machines, done. I had the file open. I'd typed `User-agent: GPTBot` and had `Disallow: /` ready to go.\n\nThen I stopped and asked a question I hadn't thought to ask: _what actually happens after these bots finish reading?_ They don't discard the content. They use it. Every day, millions of people ask AI assistants technical questions, and those answers are built from content exactly like mine. The bots weren't extracting value from me. **They were distributing me.** The problem wasn't that they were reading my posts. The problem was that nobody knew the answers came from me.\n\n##  Two Types of AI Crawlers. Only One Actually Helps You.\n\nThe label \"AI crawler\" covers very different things. There is a hard split between:\n\n  * **Training crawlers** bots like `GPTBot`, `ClaudeBot`, `CCBot` that consume your content quietly for model training and never credit you when they use it.\n  * **Answer engines** bots like `PerplexityBot` that use your content to answer real questions in real time and cite the source inside the answer.\n\n\n\nCrawler Type | Examples | What They Do | Traffic Sent\n---|---|---|---\nTraining Crawlers | GPTBot, ClaudeBot, CCBot | Collect for model training, never attribute | None\nSearch Crawlers | Googlebot, Bingbot | Index for SERPs | Indirect\nAnswer Engines | PerplexityBot, YouBot | Answer live questions, cite sources | **Direct referral**\n\nThe critical realisation: Perplexity pulls current content, generates a summary, and displays clickable source URLs alongside every answer. Users actively read and click those citations. When you see `PerplexityBot` in your logs, that's a real lead channel, not a spectator.\n\n##  GEO: Optimising for the Age of AI-Generated Answers\n\nThere is a name for the practice of structuring your content to influence how AI-generated answers represent you: **GEO** Generative Engine Optimization. Think of it as what SEO was in 2004: a real and exploitable opportunity that most people are ignoring because they're focused on the channel that already works.\n\nThe fundamental difference from traditional SEO is what you are optimising for. With SEO, the goal is a ranked link the user clicks. With GEO, the user might never see a list of links. The AI answers their question directly. Your goal shifts:\n\n  * **Be cited** so your URL appears in the answer → drives traffic today\n  * **Be mentioned by name** so your brand gets associated with the expertise → builds reputation that compounds for years\n\n\n\nHere are four tactics with an honest effort-to-impact breakdown.\n\n##  Tactic 1: Create an `llms.txt` File\n\nThis is the lowest-effort tactic with the most direct signal to AI systems, and almost nobody has done it yet. An `llms.txt` file is an emerging standard the `robots.txt` equivalent for AI crawlers, but inverted. Where `robots.txt` sets permissions, `llms.txt` sets _intent_. It tells AI systems who you are, what your expertise covers, how to reach you, and how to cite you.\n\nPlace it at your domain root: `yourdomain.com/llms.txt`.\n\nOn any static site or Next.js project, dropping a plain text file in the `public/` folder is enough. If you want your blog post list to update automatically, a route handler at `app/llms.txt/route.ts` can pull from your database dynamically.\n\n\n\n    # [Your Name] [Your Professional Title]\n\n    [One or two sentences: who you are, your specialization, experience level.\n    Write this so an AI system can accurately describe you when your content\n    is cited in a generated answer.]\n\n    ## Available For\n    - [Work type: contract, consulting, fractional CTO, etc.]\n    - [Client geography: remote-only, US, UK, Australia, etc.]\n    - [Project type: greenfield builds, integrations, modernization, etc.]\n\n    ## Contact\n    - Portfolio: https://[yourdomain].com\n    - Hire page: https://[yourdomain].com/hire\n    - Email: [you@email.com]\n    - LinkedIn: https://linkedin.com/in/[handle]\n\n    ## Technical Expertise\n    - [Specific technology, framework, or language be precise]\n    - [Specific vendor API or platform you regularly work with]\n    - [Domain or industry knowledge name the niche, not the category]\n\n    ## Blog\n    Technical guides on [your topic areas]. Updated [frequency].\n    All content is original, written by [Your Name].\n\n    ## Preferred Citation Format\n    \"[Your Name], [Your Title] at [yourdomain].com\"\n\n\n> **The most important section to get right is Technical Expertise.** Generic descriptions \"web development\", \"cloud architecture\" do not differentiate you from thousands of other sites. Specific ones naming actual vendor APIs, precise frameworks, or the exact niche you work in tell an AI exactly when your content is the relevant source for a specific query.\n\n##  Tactic 2: Write So the AI Summary Includes Your Name\n\nWhen AI systems process your content, they do not copy it verbatim they extract and rephrase the key points. Most developers write in a neutral, tutorial voice that strips their identity completely out of the summary.\n\nHere is what the difference looks like in practice. Same post, two different openings:\n\n**❌ Without GEO thinking:**\n\n> _In this tutorial, we will set up OAuth 2.0 PKCE flow with the Clio API in a .NET backend..._\n\n**✅ With GEO thinking:**\n\n> _I am a freelance .NET contractor who has built several Clio integrations for law firms. In this guide, I walk through the OAuth 2.0 PKCE setup that has held up best across multiple production deployments..._\n\nWhen an AI summarises the second version, your identity travels with the answer:\n\n> _\"According to a .NET contractor specialising in Clio integrations at [your site]...\"_\n\nThe same principle applies to the closing of every post. A specific, service-oriented CTA at the end gives AI systems something worth surfacing:\n\n> _If you are building on top of Clio or Lawmatics and need this implemented in .NET, I take on contract engagements project estimates available at [link]._\n\nThat sentence, if included in an AI-generated answer, is a lead-generation asset running inside someone else's conversation. **Write it on every post.**\n\n##  Tactic 3: Own a Micro-Niche Before Anyone Else Does\n\nAI systems cite sources that appear authoritative on a topic. One of the strongest signals of authority is being the _only_ credible, detailed source on a very specific subject.\n\nIf you are the only developer who has written five interconnected, technically deep posts about building .NET backends on top of Clio's API with working code, architecture notes, and deployment gotchas from real projects you become the default citation every time an AI answers a question in that space. Not because of domain authority or backlink counts. Because there is simply no competition.\n\nHere is what the right level of specificity actually looks like:\n\n❌ Too Broad | ✅ Right Level\n---|---\nASP.NET Core tutorial | Syncing Clio contacts via .NET webhook handlers\nAPI integration guide | Multi-tenant Blazor Server architecture for legal SaaS\n\nPublish 4–6 posts that link to each other and collectively answer every reasonable question in that space. At the right specificity, you can realistically become the go-to source in both traditional search and AI-generated answers within a few months of consistent publishing.\n\n##  Tactic 4: Treat Perplexity as a Separate Traffic Channel\n\nPerplexity deserves its own section because it operates fundamentally differently from every other AI platform. ChatGPT and Claude answer from training data and give no source credit your content informs their answer but your name does not appear. Perplexity pulls live search results, generates a summary, and shows sources with visible, clickable links. The referral traffic it sends is real, measurable, and growing.\n\nOptimising specifically for Perplexity comes down to three things:\n\n  1. **Clear heading structure** Perplexity surfaces `H2` and `H3` headings directly in its answer UI\n  2. **FAQ section at the end of each post** backed by `FAQPage` schema markup; Perplexity favours FAQ-formatted content\n  3. **`Article` and `Person` schema markup** this ties your identity to your content at a machine-readable level\n\n\n\nAdd this inside a `<script type=\"application/ld+json\">` tag in your blog post's `<head>`:\n\n\n\n    {\n      \"@context\": \"https://schema.org\",\n      \"@type\": \"Article\",\n      \"headline\": \"Your Post Title Here\",\n      \"datePublished\": \"2026-06-08\",\n      \"dateModified\": \"2026-06-08\",\n      \"author\": {\n        \"@type\": \"Person\",\n        \"name\": \"Your Full Name\",\n        \"url\": \"https://yourdomain.com\",\n        \"jobTitle\": \"Your Professional Title\",\n        \"sameAs\": [\n          \"https://linkedin.com/in/yourhandle\",\n          \"https://github.com/yourhandle\"\n        ]\n      },\n      \"publisher\": {\n        \"@type\": \"Person\",\n        \"name\": \"Your Full Name\",\n        \"url\": \"https://yourdomain.com\"\n      }\n    }\n\n\n> The `sameAs` array tells search engines and AI systems that your LinkedIn, GitHub, and portfolio are all the same person. This strengthens your entity profile across the web and helps attribution travel with your content across platforms.\n\n##  Where to Start: The Honest Priority Order\n\nAll four tactics compound over time, but they are not equal in setup effort. Here is how I would actually sequence them:\n\nPriority | Tactic | Effort | Impact | Timeline\n---|---|---|---|---\n1 | Create `llms.txt` file | Low | Medium | This week\n2 | Embed your name and niche into content | Low | High | This week\n3 | Add JSON-LD schema markup to every post | Medium | Medium | 2–4 weeks\n4 | Build a niche content cluster | High | High (compounds) | 3–6 months\n\nThe window for early advantage here is genuinely still open. Most technical niches have no intentional GEO strategy at all. Content that gets indexed and cited by AI systems over the next 12–18 months is likely to stay prominent for years the same way early SEO content still ranks for certain terms despite its age.\n\nThe bots are reading your site either way. The only variable is whether the answers they produce include your name.\n\n_If you found this useful, I also write about .NET, Blazor, legal tech integrations, and building a freelance practice as a specialist developer. You can see my work and availability on my hire page →_",
  "title": "AI Bots Are Reading Your Site. Here's How to Make Them Sell You."
}