External Publication

Why your YouTube transcript scraper started returning empty strings (and how to fix it in 2026)

DEV Community [Unofficial] June 17, 2026

If you have a script that pulled YouTube transcripts a year ago, there's a good chance it quietly broke. It still runs, no errors — it just returns empty. Here's what changed, and how to actually get captions again.

The symptom

You hit YouTube's caption endpoint (timedtext), get back HTTP 200 … and an empty body. No exception, no 403, nothing to catch. Just nothing. So your pipeline happily writes empty transcripts and you don't notice until your RAG index is full of blanks.

This is why a lot of the popular libraries went dark in 2025–2026, even ones that are still "maintained." The request shape that used to work now returns nothing.

What actually changed: PoToken

YouTube now requires a Proof-of-Origin Token (PoToken) — generated by its BotGuard system — on caption requests. Without a valid token bound to the specific video, timedtext returns that empty 200. Datacenter IPs (AWS/GCP/Azure) also get blocked or throttled hard, which is the second reason server-side scrapers silently fail.

So the modern recipe is three steps:

Fetch the watch page and parse ytInitialPlayerResponse for the caption tracks + visitorData.
Mint a PoToken bound to the video ID by solving the BotGuard challenge.
Request the caption track with &pot=<token>&c=WEB&fmt=json3 — then you get real JSON back.

The PoToken part is the bit everyone gets stuck on. You don't have to reverse-engineer BotGuard yourself — bgutils-js (paired with jsdom to give it a DOM to run in) handles the challenge. Here's the shape of it:

import { BG, buildURL, GOOG_API_KEY } from 'bgutils-js';
import { JSDOM } from 'jsdom';

// 1. give BotGuard a DOM to run in
const dom = new JSDOM('<!DOCTYPE html><html><body></body></html>', {
  url: 'https://www.youtube.com/',
});
Object.assign(globalThis, { window: dom.window, document: dom.window.document });

// 2. solve the challenge -> integrity token -> a minter bound to your session
const challenge = await BG.Challenge.create({ fetch, globalObj: globalThis, requestKey, identifier: visitorData });
new Function(challenge.interpreterJavascript.privateDoNotAccessOrElseSafeScriptWrappedValue)();
const bg = await BG.BotGuardClient.create({ program: challenge.program, globalName: challenge.globalName, globalObj: globalThis });
const out = [];
const it = await fetch(buildURL('GenerateIT', false), {
  method: 'POST',
  headers: { 'Content-Type': 'application/json+protobuf', 'x-goog-api-key': GOOG_API_KEY },
  body: JSON.stringify([requestKey, await bg.snapshot({ webPoSignalOutput: out })]),
});
const minter = await BG.WebPoMinter.create({ integrityToken: (await it.json())[0] }, out);

// 3. mint a token bound to THIS video, attach to the caption URL
const pot = await minter.mintAsWebsafeString(videoId);
const url = new URL(captionTrack.baseUrl);
url.searchParams.set('fmt', 'json3');
url.searchParams.set('pot', pot);
url.searchParams.set('c', 'WEB');
const segments = (await (await fetch(url)).json()).events; // <- finally, real data

A couple of things that bit me:

Bind the token to the video ID (mintAsWebsafeString(videoId)), not a generic identifier — a session-only token still returns empty on timedtext.
&c=WEB is required alongside &pot=. Miss it and you're back to the empty 200.
The integrity token has a TTL (~12h), so for batches you bootstrap once and reuse the minter.

I packaged it as a tiny CLI

I got tired of re-deriving this, so I put it in a zero-config MIT package:

npx get-youtube-transcript https://www.youtube.com/watch?v=jNQXAC9IVRw

It does the whole watch-page → PoToken → captions dance and prints the transcript (text/JSON/SRT). Source: github.com/jamhimself/youtube-transcript-cli. It's single-video and runs from your IP — perfect for scripts and notebooks.

Where it gets hard: scale

The CLI works great until you need hundreds or thousands of videos. Then you hit the other wall: YouTube rate-limits and blocks datacenter IPs, so an unattended server job gets throttled fast. At that point you need rotating residential proxies + retries + uptime monitoring, which is a different project from "parse the captions."

That's the part I run as a hosted service on Apify — a transcript scraper and a whole-channel-to-RAG version that lists a channel and returns every video's transcript as chunked, embed-ready text. Same engine as the CLI, just with the proxy/uptime layer so you don't babysit it. (Mentioning it because "how do I do this at scale" is the inevitable next question — but the CLI above is genuinely all you need for low-volume.)

TL;DR

Empty transcripts in 2026 = missing PoToken + datacenter-IP blocks.
Recipe: watch page → caption tracks → mint a video-bound PoToken (bgutils-js) → fetch with &pot=&c=WEB&fmt=json3.
For one-off use, npx get-youtube-transcript <url>.
For scale, you need residential proxies + retries on top — that's the real cost, not the parsing.

If your captions pipeline has been quietly returning blanks, now you know why. Go check your RAG index. 🙃

The symptom

What actually changed: PoToken

I packaged it as a tiny CLI

Where it gets hard: scale

TL;DR

Discussion in the ATmosphere