How We Built Our Own YouTube Transcript Service

This is a rough build log: the problem, the experiments, what failed, and the architecture that finally worked.

The Problem

I wanted my own small service for extracting transcripts from YouTube videos. Not another third-party website, not a manual download flow, and not a file that I have to pass to an agent by hand. I wanted a service.

There were two goals:

a web interface for me: paste a YouTube URL and get the transcript;
an API for agents like OpenClaw and Hermes, so they can fetch transcripts directly.

The final web entry point:

https://app.dima.cloud/transcribe/

The API endpoints:

GET https://app.dima.cloud/transcribe/api/transcript?url=YOUTUBE_URL
GET https://app.dima.cloud/transcribe/api/transcript.txt?url=YOUTUBE_URL

The First Prototype

The initial version had two files: a Python agent and a React UI. The Python script already tried two transcript sources:

youtube-transcript-api
yt-dlp

The UI, however, was not a good self-hosted interface. It was trying to call an external API directly from the browser instead of calling our own backend.

So we replaced that with a simpler architecture:

FastAPI backend
plain HTML page
systemd
nginx route

What We Deployed

The service lives on the server at:

/opt/youtube-transcribe-service

It runs through systemd:

youtube-transcribe.service

Internally it listens on:

127.0.0.1:8092

Nginx exposes it publicly:

/transcribe/ -> 127.0.0.1:8092

The JSON endpoint returns a response like this:

{
  "success": true,
  "transcript": "...",
  "language": "en",
  "word_count": 3215,
  "char_count": 17150,
  "video_id": "z02Y-1OvWSM",
  "method": "ytdlp"
}

The plain text endpoint returns only the transcript body, which is usually better for agents.

The First Infrastructure Bug

At first, /transcribe redirected to the wrong URL:

https://app.dima.cloud:2443/transcribe/

The reason was the nginx setup. The HTTPS server block was listening on an internal port, 127.0.0.1:2443, behind an external routing layer. Nginx generated an absolute redirect with that internal port.

The fix was to remove the redirect and serve the page directly from both:

/transcribe
/transcribe/

The Real Problem: YouTube Blocks VPS IPs

The hard part was not FastAPI, nginx, or the UI. The hard part was egress IP reputation.

The server was using a VPS/cloud IP. YouTube frequently blocks those addresses. The typical errors were:

Sign in to confirm you're not a bot
YouTube is blocking requests from your IP
HTTP Error 429: Too Many Requests

This happened with both youtube-transcript-api and yt-dlp. So it was not a single-library problem.

Attempt 1: Browser Cookies

The first free workaround was to export YouTube cookies from a logged-in browser and put them on the server:

/opt/youtube-transcribe-service/cookies.txt

The service could use that file through:

YOUTUBE_TRANSCRIPT_COOKIE_PATH=/opt/youtube-transcribe-service/cookies.txt

This helped temporarily, but the approach was fragile. At some point yt-dlp reported:

The provided YouTube account cookies are no longer valid.
They have likely been rotated in the browser as a security measure.

Conclusion: cookies are useful as a temporary workaround, but not as a production architecture. They expire, rotate, and become suspicious when browser cookies are reused from a VPS IP.

Attempt 2: Reverse SOCKS Through My Mac

The next free attempt was to route the server through my Mac:

server -> SSH reverse SOCKS -> Mac -> internet

When the Mac used normal internet, YouTube saw the Mac’s IP instead of the VPS IP. That worked.

But when the Mac was connected to a VPN hosted on the same server, the route became a loop:

server -> Mac -> VPN -> same server -> YouTube

Conclusion: nice proof of concept, but not production. The Mac has to stay awake, the tunnel has to stay alive, and VPN routing can break everything.

Attempt 3: Webshare Static Proxies

We then tested a Webshare list of 10 static proxies.

Each proxy was tested against real subtitle downloads:

does the proxy respond to IP checks?
can YouTube list available subtitles?
can it download actual timedtext subtitle files?

None of the 10 proxies worked reliably. Some returned 429 Too Many Requests. Some produced Requested format is not available. Some timed out.

Interesting detail: some proxies could list subtitles through --list-subs, but failed when downloading the actual timedtext files.

Conclusion: static/datacenter proxies are not enough for this problem.

The Working Solution: DataImpulse Residential Proxy

The working solution was a DataImpulse residential proxy.

We verified it from the server with:

curl --proxy "http://user:pass@gw.dataimpulse.com:823/" https://api.ipify.org

The external IP became a residential IP, not the VPS IP.

Then yt-dlp successfully downloaded real .vtt subtitle files for:

a control video: z02Y-1OvWSM;
a previously failing video: GEzbesM_X3U.

The production service was switched to:

YOUTUBE_PROXY_URL=http://user:pass@gw.dataimpulse.com:823/

After restarting the service, the public endpoints started working.

Simplifying the Interface

The original UI had two options:

language: ru / en / auto
method: api / ytdlp / auto

In practice, they were noise. The user almost always wants the best available transcript. So the UI was reduced to:

YouTube URL -> Get -> Transcript or error

The backend still supports the old parameters for agent compatibility, but the human interface now only asks for the URL.

The Final Architecture

User or agent
        ↓
https://app.dima.cloud/transcribe/
        ↓
FastAPI on VPS
        ↓
yt-dlp / youtube-transcript-api
        ↓
DataImpulse residential proxy
        ↓
YouTube timedtext/subtitles
        ↓
transcript text

What I Learned

Building a transcript UI and API is straightforward.
Fetching YouTube subtitles reliably from a VPS is not.
The real issue is not the Python library; it is IP reputation.
Cookies help temporarily but create operational debt.
A home tunnel works, but it is not autonomous.
Static proxies do not solve the problem.
A residential proxy does.

What Should Be Added Next

Cache transcripts by video_id, so repeated requests do not spend proxy traffic.
Add rate limiting for agents.
Add an API key, because the endpoint is public.
Add a proxy healthcheck endpoint.
Add a job queue for long videos.
Improve error categories: no subtitles, proxy timeout, YouTube 429, invalid URL.

Final Takeaway

A YouTube transcript service is not only about choosing a Python library. It is about giving the service the right path to YouTube.

Without a residential or mobile proxy, the system keeps breaking on cookies, VPN routing, 429 responses, and bot checks. With a residential proxy, it becomes a practical microservice that both humans and agents can use.