This is a rough build log: the problem, the experiments, what failed, and the architecture that finally worked.
The Problem
I wanted my own small service for extracting transcripts from YouTube videos. Not another third-party website, not a manual download flow, and not a file that I have to pass to an agent by hand. I wanted a service.
There were two goals:
- a web interface for me: paste a YouTube URL and get the transcript;
- an API for agents like OpenClaw and Hermes, so they can fetch transcripts directly.
The final web entry point:
https://app.dima.cloud/transcribe/
The API endpoints:
GET https://app.dima.cloud/transcribe/api/transcript?url=YOUTUBE_URL
GET https://app.dima.cloud/transcribe/api/transcript.txt?url=YOUTUBE_URL
The First Prototype
The initial version had two files: a Python agent and a React UI. The Python script already tried two transcript sources:
youtube-transcript-apiyt-dlp
The UI, however, was not a good self-hosted interface. It was trying to call an external API directly from the browser instead of calling our own backend.
So we replaced that with a simpler architecture:
FastAPI backend
plain HTML page
systemd
nginx route
What We Deployed
The service lives on the server at:
/opt/youtube-transcribe-service
It runs through systemd:
youtube-transcribe.service
Internally it listens on:
127.0.0.1:8092
Nginx exposes it publicly:
/transcribe/ -> 127.0.0.1:8092
The JSON endpoint returns a response like this:
{
"success": true,
"transcript": "...",
"language": "en",
"word_count": 3215,
"char_count": 17150,
"video_id": "z02Y-1OvWSM",
"method": "ytdlp"
}
The plain text endpoint returns only the transcript body, which is usually better for agents.
The First Infrastructure Bug
At first, /transcribe redirected to the wrong URL:
https://app.dima.cloud:2443/transcribe/
The reason was the nginx setup. The HTTPS server block was listening on an internal port, 127.0.0.1:2443, behind an external routing layer. Nginx generated an absolute redirect with that internal port.
The fix was to remove the redirect and serve the page directly from both:
/transcribe
/transcribe/
The Real Problem: YouTube Blocks VPS IPs
The hard part was not FastAPI, nginx, or the UI. The hard part was egress IP reputation.
The server was using a VPS/cloud IP. YouTube frequently blocks those addresses. The typical errors were:
Sign in to confirm you're not a bot
YouTube is blocking requests from your IP
HTTP Error 429: Too Many Requests
This happened with both youtube-transcript-api and yt-dlp. So it was not a single-library problem.
Attempt 1: Browser Cookies
The first free workaround was to export YouTube cookies from a logged-in browser and put them on the server:
/opt/youtube-transcribe-service/cookies.txt
The service could use that file through:
YOUTUBE_TRANSCRIPT_COOKIE_PATH=/opt/youtube-transcribe-service/cookies.txt
This helped temporarily, but the approach was fragile. At some point yt-dlp reported:
The provided YouTube account cookies are no longer valid.
They have likely been rotated in the browser as a security measure.
Conclusion: cookies are useful as a temporary workaround, but not as a production architecture. They expire, rotate, and become suspicious when browser cookies are reused from a VPS IP.
Attempt 2: Reverse SOCKS Through My Mac
The next free attempt was to route the server through my Mac:
server -> SSH reverse SOCKS -> Mac -> internet
When the Mac used normal internet, YouTube saw the Mac’s IP instead of the VPS IP. That worked.
But when the Mac was connected to a VPN hosted on the same server, the route became a loop:
server -> Mac -> VPN -> same server -> YouTube
Conclusion: nice proof of concept, but not production. The Mac has to stay awake, the tunnel has to stay alive, and VPN routing can break everything.
Attempt 3: Webshare Static Proxies
We then tested a Webshare list of 10 static proxies.
Each proxy was tested against real subtitle downloads:
- does the proxy respond to IP checks?
- can YouTube list available subtitles?
- can it download actual timedtext subtitle files?
None of the 10 proxies worked reliably. Some returned 429 Too Many Requests. Some produced Requested format is not available. Some timed out.
Interesting detail: some proxies could list subtitles through --list-subs, but failed when downloading the actual timedtext files.
Conclusion: static/datacenter proxies are not enough for this problem.
The Working Solution: DataImpulse Residential Proxy
The working solution was a DataImpulse residential proxy.
We verified it from the server with:
curl --proxy "http://user:pass@gw.dataimpulse.com:823/" https://api.ipify.org
The external IP became a residential IP, not the VPS IP.
Then yt-dlp successfully downloaded real .vtt subtitle files for:
- a control video:
z02Y-1OvWSM; - a previously failing video:
GEzbesM_X3U.
The production service was switched to:
YOUTUBE_PROXY_URL=http://user:pass@gw.dataimpulse.com:823/
After restarting the service, the public endpoints started working.
Simplifying the Interface
The original UI had two options:
language: ru / en / auto
method: api / ytdlp / auto
In practice, they were noise. The user almost always wants the best available transcript. So the UI was reduced to:
YouTube URL -> Get -> Transcript or error
The backend still supports the old parameters for agent compatibility, but the human interface now only asks for the URL.
The Final Architecture
User or agent
↓
https://app.dima.cloud/transcribe/
↓
FastAPI on VPS
↓
yt-dlp / youtube-transcript-api
↓
DataImpulse residential proxy
↓
YouTube timedtext/subtitles
↓
transcript text
What I Learned
- Building a transcript UI and API is straightforward.
- Fetching YouTube subtitles reliably from a VPS is not.
- The real issue is not the Python library; it is IP reputation.
- Cookies help temporarily but create operational debt.
- A home tunnel works, but it is not autonomous.
- Static proxies do not solve the problem.
- A residential proxy does.
What Should Be Added Next
- Cache transcripts by
video_id, so repeated requests do not spend proxy traffic. - Add rate limiting for agents.
- Add an API key, because the endpoint is public.
- Add a proxy healthcheck endpoint.
- Add a job queue for long videos.
- Improve error categories: no subtitles, proxy timeout, YouTube 429, invalid URL.
Final Takeaway
A YouTube transcript service is not only about choosing a Python library. It is about giving the service the right path to YouTube.
Without a residential or mobile proxy, the system keeps breaking on cookies, VPN routing, 429 responses, and bot checks. With a residential proxy, it becomes a practical microservice that both humans and agents can use.