Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihs7yi3fiqbp64z5meoqglmoic7yowuou3so6vod7ppkgitcreynq",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mphviabzwmq2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreidx43hccywvuwufrgpbmtvmm7mfn3thmej5fd3s2jfjz6xlncsm2u"
    },
    "mimeType": "image/webp",
    "size": 82136
  },
  "path": "/emongmarcc/technical-seo-audit-checklist-for-modern-web-applications-what-crawlers-actually-see-4fbi",
  "publishedAt": "2026-06-30T01:00:07.000Z",
  "site": "https://dev.to",
  "tags": [
    "seo",
    "webdev",
    "laravel",
    "todayilearned",
    "Rich Results Test",
    "Schema Markup Validator",
    "HanzWeb.ae",
    "@push",
    "@context",
    "@type",
    "@endpush"
  ],
  "textContent": "You shipped a beautiful web application. Clean code, smooth UX, fast on your machine. Then you check Google Search Console and realize your pages are barely indexed, your structured data is throwing errors, and half your canonical tags are pointing to the wrong URLs. Sound familiar?\n\nTechnical SEO is the unsexy foundation that either unlocks or blocks all the content work you do on top of it. This audit checklist is built for developers — not marketers — so we'll go deep on the implementation details, not just the theory.\n\n##  1. Crawlability and Indexation\n\nBefore anything else, you need to verify that Googlebot can actually find and read your pages.\n\n###  robots.txt\n\nYour `robots.txt` lives at the root of your domain. A common mistake in Laravel apps is accidentally blocking crawlers in production because someone copied a staging config.\n\n\n\n    User-agent: *\n    Disallow: /admin/\n    Disallow: /api/\n    Allow: /\n\n    Sitemap: https://yourdomain.com/sitemap.xml\n\n\nVerify it at `https://yourdomain.com/robots.txt` and test specific URLs using Google Search Console's URL Inspection tool.\n\n###  XML Sitemap\n\nYour sitemap should include all canonical, indexable URLs — nothing behind auth walls, nothing with `noindex`. In Laravel, the `spatie/laravel-sitemap` package makes this straightforward:\n\n\n\n    use Spatie\\Sitemap\\Sitemap;\n    use Spatie\\Sitemap\\Tags\\Url;\n\n    Sitemap::create()\n        ->add(\n            Url::create('/blog')\n                ->setLastModificationDate(now())\n                ->setChangeFrequency(Url::CHANGE_FREQUENCY_DAILY)\n                ->setPriority(0.8)\n        )\n        ->writeToFile(public_path('sitemap.xml'));\n\n\nDon't just generate it once — hook it into your deployment pipeline or schedule it via `php artisan schedule:run`.\n\n##  2. Canonical Tags and Duplicate Content\n\nDuplicate content is one of the most common technical SEO issues, especially in e-commerce and CMS-driven apps. URL variations like `?ref=newsletter`, `?sort=price`, or trailing slash inconsistencies all create duplicate signals.\n\n###  Every page needs a self-referencing canonical\n\n\n    <link rel=\"canonical\" href=\"https://yourdomain.com/products/running-shoes\" />\n\n\nIn Laravel Blade, centralise this:\n\n\n\n    <link rel=\"canonical\" href=\"{{ $canonical ?? url()->current() }}\" />\n\n\nThen in your controllers or Livewire components, explicitly set the canonical when needed — especially for paginated pages, filtered product listings, or tag archives.\n\n###  HTTP vs HTTPS, WWW vs non-WWW\n\nPick one and redirect everything else to it with a 301. Check your `.htaccess` or Nginx config. This should be handled at the server level, not just in Laravel's middleware.\n\n##  3. Structured Data (Schema Markup)\n\nStructured data doesn't guarantee rich results, but it does help Google understand your content. For a web app, the relevant schemas are usually `Article`, `Product`, `FAQPage`, `BreadcrumbList`, and `LocalBusiness`.\n\n\n\n    @push('head')\n    <script type=\"application/ld+json\">\n    {\n      \"@context\": \"https://schema.org\",\n      \"@type\": \"Article\",\n      \"headline\": \"{{ $post->title }}\",\n      \"datePublished\": \"{{ $post->published_at->toIso8601String() }}\",\n      \"author\": {\n        \"@type\": \"Person\",\n        \"name\": \"{{ $post->author->name }}\"\n      },\n      \"image\": \"{{ $post->og_image_url }}\"\n    }\n    </script>\n    @endpush\n\n\nValidate everything using Google's Rich Results Test and the Schema Markup Validator.\n\n##  4. Core Web Vitals and Page Experience Signals\n\nGoogle's Page Experience signals include LCP, INP (replacing FID), and CLS. These are measurable, fixable, and directly tied to ranking.\n\n  * **LCP (Largest Contentful Paint):** Should be under 2.5s. Preload your hero images with `<link rel=\"preload\" as=\"image\">`. Lazy-load everything below the fold.\n  * **INP (Interaction to Next Paint):** Heavy JavaScript blocking the main thread is the usual culprit. Audit your JS bundle — Alpine.js stays lean, but watch for third-party scripts.\n  * **CLS (Cumulative Layout Shift):** Always set explicit `width` and `height` on images and iframes. Reserve space for async-loaded UI elements.\n\n\n\nRun `npx lighthouse https://yourdomain.com --view` locally for a quick diagnostic.\n\n##  5. Metadata Completeness\n\nEvery page needs a unique, descriptive `<title>` and `<meta name=\"description\">`. These won't directly boost rankings but they affect click-through rates, which does matter.\n\n\n\n    <title>{{ $page->seo_title ?? $page->title . ' | ' . config('app.name') }}</title>\n    <meta name=\"description\" content=\"{{ $page->meta_description ?? $page->excerpt }}\" />\n\n\nAlso audit your Open Graph and Twitter Card tags — these control how your pages look when shared:\n\n\n\n    <meta property=\"og:title\" content=\"{{ $page->og_title ?? $page->title }}\" />\n    <meta property=\"og:image\" content=\"{{ $page->og_image ?? asset('images/default-og.jpg') }}\" />\n    <meta property=\"og:type\" content=\"website\" />\n\n\nKeep titles under 60 characters and descriptions under 155. Use a spreadsheet to audit them at scale — export your URLs and titles via a crawler like Screaming Frog.\n\n##  6. Mobile and Internationalisation\n\n###  Mobile-First Indexing\n\nGoogle now indexes the mobile version of your site first. Test with Chrome DevTools in mobile emulation and verify your responsive breakpoints aren't hiding critical content behind JavaScript toggles.\n\n###  hreflang for Multi-Language Apps\n\nIf you're running a multi-language Laravel app, hreflang tells Google which version to serve for which locale:\n\n\n\n    <link rel=\"alternate\" hreflang=\"en\" href=\"https://yourdomain.com/en/about\" />\n    <link rel=\"alternate\" hreflang=\"ar\" href=\"https://yourdomain.com/ar/about\" />\n    <link rel=\"alternate\" hreflang=\"x-default\" href=\"https://yourdomain.com/en/about\" />\n\n\nThis is particularly relevant for businesses operating in multilingual markets — something the team at HanzWeb.ae encounters regularly when building regional web applications for clients across the UAE and MENA.\n\n##  7. HTTPS, Security Headers, and URL Structure\n\n  * Ensure all internal links use HTTPS. Mixed content warnings can affect crawl behaviour.\n  * Use descriptive, hyphenated slugs: `/blog/technical-seo-audit` not `/blog?id=87`\n  * Avoid deep URL nesting beyond three levels\n  * Return proper HTTP status codes: 404 for missing pages, 410 for intentionally deleted content, 301 for permanent redirects\n\n\n\nCheck your redirect chains — a 301 that hits another 301 before reaching the destination wastes crawl budget and dilutes link equity.\n\n##  8. Log File Analysis (Underused but Powerful)\n\nServer logs tell you exactly what Googlebot is crawling and how often. Tools like Screaming Frog Log Analyzer or even a simple `grep` on your Nginx/Apache logs can reveal:\n\n  * Pages being crawled but not indexed\n  * Soft 404s (pages returning 200 but showing empty content)\n  * Crawl budget being wasted on paginated parameter URLs\n\n\n\n\n    grep 'Googlebot' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20\n\n\nThis gives you the top 20 URLs Googlebot is spending time on. If it's hitting `/api/` endpoints or admin routes, fix your robots.txt immediately.\n\n##  Putting It All Together\n\nTechnical SEO isn't a one-time task — it's an ongoing audit practice. The checklist above covers the highest-impact areas, but the real discipline is building these checks into your development workflow rather than treating them as an afterthought post-launch.\n\nSet up a quarterly crawl with Screaming Frog, monitor Search Console weekly for coverage errors, and make structured data and canonical logic part of your page templates from day one. The applications that rank consistently aren't the ones with the cleverest content strategy — they're the ones with a technically sound foundation that search engines can trust.",
  "title": "Technical SEO Audit Checklist for Modern Web Applications: What Crawlers Actually See"
}