Documentation

These docs live in the repo. Open the files below to edit or share with the team.

Core Docs
Getting Started
Fast local setup and verification checklist.
GETTING_STARTED.md

GETTING_STARTED.md - TTS Megasystem Quick Start

This is the shortest path to a working local setup. For full details, see SETUP.md and ARCHITECTURE.md.

1) Install

git clone <your-repo-url>
cd TTS-Premium
npm install

2) Initialize local D1 (first time only)

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

3) Start services

# From repo root (API + React editor)
npm run dev:clean

# In another terminal (SvelteKit shell)
cd apps/web-svelte
npm run dev

4) Open the app

  • SvelteKit shell: http://localhost:5175 (or next available port)
  • React editor: http://localhost:3001
  • API: http://localhost:8787

5) Configure env (if not already)

cd apps/web-svelte
cp .env.example .env
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001
cd apps/api
cp .dev.vars.example .dev.vars
# Add required keys for AI providers as needed.

6) What to verify

  • /login loads and signs in.
  • /home shows projects (or empty state).
  • "Open in Editor" sends you to /editor/:id in the React app.

Troubleshooting (fast fixes)

Ports in use:

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Wrangler login fails:

rm -rf ~/.wrangler
npm run wrangler login

Missing D1 tables:

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
Setup Guide
Full install, Cloudflare resources, and dev workflow.
SETUP.md

SETUP.md - TTS Megasystem Setup Guide

๐Ÿ“‹ Prerequisites Checklist

  • Node.js 20+ installed
  • npm installed (bundled with Node.js)
  • Cloudflare account created
  • Git installed
  • Docker installed (for extractor + XTTS wrapper)

๐ŸŽฏ Step-by-Step Setup

  1. Clone & Install
git clone <your-repo-url>
cd TTS-Premium
npm install
  1. Configure Cloudflare (API)
cd apps/api
npm run wrangler login

# Create D1 database
npm run wrangler d1 create tts-megasystem-db
# Copy database_id from output

# Create R2 bucket
npm run wrangler r2 bucket create tts-megasystem-assets
  1. Update apps/api/wrangler.toml
  • Replace database_id under [[d1_databases]]
  • Replace bucket_name under [[r2_buckets]]
  1. Initialize Database (D1)
cd apps/api

# Use the shared schema
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
  1. Environment Variables
cd apps/api
cp .dev.vars.example .dev.vars   # if provided
# Otherwise create .dev.vars and add keys from your environment
cd apps/web-svelte
cp .env.example .env             # if provided
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001
  1. Start Development
# From repo root (starts API + React editor)
npm run dev:clean

# Start SvelteKit shell
cd apps/web-svelte
npm run dev

Expected ports:

  1. Verify Setup
  • Shell loads at /home after login
  • Projects list loads
  • Open in Editor redirects to React editor
  • API responds at http://localhost:8787

๐Ÿ”ง Common Issues

Port already in use

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Wrangler login fails

rm -rf ~/.wrangler
npm run wrangler login

D1 database not found

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

๐Ÿš€ Next Steps

  1. Auth hardening (cookie-based, cross-app)
  2. Project model expansion (assets, timeline, voice settings)
  3. Billing gates (Free/Pro/Premium)

Setup Time: ~30 minutes โ€ข Difficulty: Intermediate

Architecture
Hybrid SvelteKit + React editor + Workers design.
ARCHITECTURE.md

ARCHITECTURE.md - TTS Megasystem Architecture

๐Ÿ›๏ธ Overview The platform uses a hybrid frontend (SvelteKit shell + React editor) with a Cloudflare Workers API. This keeps global browsing fast while preserving a rich editor experience.

๐ŸŽฏ Why Hybrid

  • Shell (SvelteKit): fast navigation, auth, dashboard, settings.
  • Editor (React): complex audio/video UX, timeline, waveform, previews.

๐Ÿ“Š Architecture Diagram โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ USER EXPERIENCE โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ โ”‚ โ”‚ Shell (SvelteKit) โ”‚ โ”‚ /home /chat /settings /library โ”‚ โ”‚ โ”‚ โ”‚ โ†“ Open project โ”‚ โ”‚ โ”‚ โ”‚ Editor (React) โ”‚ โ”‚ /editor/:projectId โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ API calls โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ CLOUDFLARE WORKERS API (Hono) โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ /api/auth /api/projects /api/files โ”‚ โ”‚ /api/tts /api/voxdub /api/asr โ”‚ โ”‚ /api/preview /api/translate /api/jobs โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ D1 โ”‚ โ”‚ R2 โ”‚ โ”‚ Queues โ”‚ โ”‚ (SQL DB) โ”‚ โ”‚ (Assets) โ”‚ โ”‚ (Jobs) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”„ Core Request Flows

  1. Project list (Shell) Shell โ†’ /api/projects โ†’ D1 โ†’ JSON โ†’ Shell renders

  2. Open editor Shell redirects to /editor/:id โ†’ React loads โ†’ /api/projects/:id

  3. TTS Editor โ†’ /api/tts/generate โ†’ Worker โ†’ Provider โ†’ R2 โ†’ response

  4. Video dub Editor โ†’ /api/voxdub/create โ†’ Queue โ†’ Workers โ†’ R2 โ†’ preview

๐Ÿ—„๏ธ Storage Strategy

  • D1: structured metadata (projects, jobs, users).
  • R2: large assets (audio, images, video, manifests).
  • Queues: async processing for dubbing/preview render.

๐Ÿ“˜ API Contract The authoritative endpoint and schema reference lives in API_CONTRACT.md. Keep SvelteKit shell and React editor aligned with it to avoid drift.

๐Ÿ” Auth Token-based auth via Workers API. Client stores token in localStorage and sends it as Bearer. (Can be upgraded to cookie-based later.)

๐ŸŒ Routing & Apps

  • Shell (SvelteKit): /home, /chat, /login, /signup, /settings.
  • Editor (React): /editor/:projectId (same domain in prod, via routing or redirect).
  • Shell redirects to editor using VITE_EDITOR_URL in dev.

๐Ÿงฉ Services (Local Dev)

๐Ÿ“š Data Model (Current) projects table (D1):

  • id, user_id, tenant_id, title, status, created_at, updated_at

Project metadata lives in D1. Heavy assets (audio/video/image) live in R2.

๐Ÿงต Background Jobs Queue workers handle:

  • youtube_dub, file_dub, url_dub (segment TTS + manifest)
  • Preview render (merge video + audio segments)

๐Ÿ”’ Security Notes

  • API expects Authorization: Bearer <token>.
  • CORS allowlist controlled by ALLOWED_ORIGINS.
  • Upgrade path: httpOnly cookies + CSRF protection for cross-app auth.

๐Ÿš€ Deployment

  • Shell (SvelteKit): Cloudflare Pages
  • Editor (React): Cloudflare Pages
  • API: Cloudflare Workers

Architecture Version: 1.0 Last Updated: December 2025

Commands
Common scripts, ports, and quick fixes.
COMMANDS.md

COMMANDS.md - TTS Megasystem Command Reference

Short, common commands for local development. See SETUP.md for full details.

Install

npm install

Local development

# Start API + React editor
npm run dev:clean

# Start SvelteKit shell (separate terminal)
cd apps/web-svelte
npm run dev

API (Workers)

cd apps/api
npm run dev

React editor (legacy web)

cd apps/web
npm run dev

SvelteKit shell

cd apps/web-svelte
npm run dev

D1 schema (local)

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

Wrangler login

cd apps/api
npm run wrangler login

Ports (default)

  • API: http://localhost:8787
  • React editor: http://localhost:3001
  • SvelteKit shell: http://localhost:5175 (or next available)

Kill ports quickly

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Extractor (docker)

docker-compose up -d --build extractor
docker-compose restart extractor
API Contract
Shared API endpoints, schemas, and error format.
API_CONTRACT.md

API Contract - TTS Hybrid MVP

KEEPS SVELTEKIT SHELL + REACT EDITOR IN SYNC. NO DRIFT ALLOWED.

Core Endpoints (Cloudflare Workers)

1. Script Generation

POST /api/chat-script
Content-Type: application/json

REQUEST:
{
  "prompt": "Create 5-slide faceless video: AI productivity tips"
}

RESPONSE: 200
{
  "slides": [
    {
      "id": "slide-1",
      "title": "Hook",
      "text": "Struggling with focus?",
      "image_prompt": "distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9"
    },
    {
      "id": "slide-2",
      "title": "Solution 1",
      "text": "Pomodoro: 25min work, 5min break",
      "image_prompt": "clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
    }
  ]
}

2. Slide Generation

POST /api/generate-slides
REQUEST:
{
  "slide_prompts": [
    "distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9",
    "clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
  ]
}

RESPONSE: 200
{
  "images": [
    "https://r2.dev/slide-1.jpg",
    "https://r2.dev/slide-2.jpg"
  ]
}

3. Voice Clone + Render

POST /api/render-video
REQUEST:
{
  "projectId": "proj_123",
  "slides": [...],
  "voice_sample_url": "https://r2.dev/voice.wav",
  "image_urls": [...]
}

RESPONSE: 202
{
  "render_id": "render_456",
  "status": "queued",
  "progress_url": "/api/render/456"
}

Error Format (All Endpoints)

400: { "error": "Invalid prompt", "code": "INVALID_INPUT" }
401: { "error": "Unauthorized", "code": "UNAUTH" }
429: { "error": "Rate limited", "code": "RATE_LIMIT" }

Shared Types (TypeScript)

interface Slide {
  id: string;
  title: string;
  text: string;
  image_prompt: string;
  image_url?: string;
}
Prompt Templates
Locked prompts for script and slide generation.
PROMPT_TEMPLATES.md

AI Prompt Templates - LOCKED

EXACT PROMPTS FOR CONSISTENT QUALITY. NO DEVIATIONS.

1. Groq Llama 3.1 8B - Script Generation

SYSTEM PROMPT:

You are a faceless YouTube script generator. ALWAYS return valid JSON.

Generate EXACTLY 5 slides for a 60-second video. Each slide: 8-12 seconds max.

Format: { "slides": [ { "id": "slide-1", "title": "Hook", "text": "Attention-grabbing question or stat (10 words max)", "image_prompt": "professional 16:9 slide description with EXACT TEXT overlay" } ] }

Rules:

  • Slide 1 ALWAYS hook/problem
  • Slides 2-4 ALWAYS solutions/tips
  • Slide 5 ALWAYS CTA
  • image_prompt MUST include "bold text 'EXACT TEXT' overlay, 16:9"

USER PROMPT TEMPLATE:

Create 5-slide faceless video: ${user_input}


EXPECTED OUTPUT:
```json
{
  "slides": [
    {
      "id": "slide-1",
      "title": "Hook",
      "text": "Struggling with focus?",
      "image_prompt": "distracted worker desk, bold white text 'FOCUS CRISIS' overlay, 16:9"
    }
  ]
}

2. Replicate Flux.1-schnell - Slide Images

PROMPT TEMPLATE:
"${slide_description}, professional presentation slide, EXACT TEXT '${slide.text}' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"

EXAMPLE:
"clean desk with pomodoro timer, professional presentation slide, EXACT TEXT '25min FOCUS' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"

Sources

YouTube Docs
YouTube Quickstart
Fast path to get YouTube dubbing running.
docs/YOUTUBE_QUICKSTART.md

Quick Start: YouTube Dubbing Setup

๐ŸŽฏ Goal

Enable production YouTube dubbing with real APIs in 15 minutes.

๐Ÿ“‹ Prerequisites

  • Google account (for YouTube API)
  • Credit card (for RapidAPI and OpenAI - minimal costs)
  • Cloudflare account with Workers deployed

๐Ÿš€ Step 1: YouTube Data API (5 min)

  1. Go to https://console.cloud.google.com/
  2. Click Select a project โ†’ New Project
  3. Name it "TTS Platform" โ†’ Create
  4. In search bar, type "YouTube Data API v3" โ†’ Enable
  5. Go to APIs & Services โ†’ Credentials
  6. Click + Create Credentials โ†’ API Key
  7. Copy the key (starts with AIzaSy...)
  8. (Optional) Click Restrict Key โ†’ Allow only "YouTube Data API v3"

Cost: FREE (10,000 quota units/day)


๐Ÿš€ Step 2: RapidAPI (3 min)

  1. Sign up at https://rapidapi.com/ (use Google login)
  2. Search for "YouTube MP3 Downloader" or go to: https://rapidapi.com/ytjar/api/youtube-mp3-downloader2/
  3. Click Subscribe to Test
  4. Choose Basic Plan (free 500 requests/month)
  5. Go to Endpoints tab
  6. Copy your X-RapidAPI-Key from code snippet (right side)

Cost: FREE tier (500 req/month), then $0.01/request

Alternative APIs (if preferred):


๐Ÿš€ Step 3: OpenAI Whisper (3 min)

  1. Go to https://platform.openai.com/
  2. Sign up or log in
  3. Click Settings โ†’ Billing โ†’ Add payment method
  4. Go to API Keys โ†’ + Create new secret key
  5. Name it "TTS Platform Whisper" โ†’ Copy key (starts with sk-proj-...)

Cost: $0.006 per minute of audio (10-min video = $0.06)

Note: If you already set up OpenAI for Phase 10C TTS, use the same key.


๐Ÿš€ Step 4: Add Secrets to Cloudflare (2 min)

cd apps/api

# YouTube
echo "AIzaSy_YOUR_KEY_HERE" | wrangler secret put YOUTUBE_API_KEY

# RapidAPI
echo "YOUR_RAPIDAPI_KEY_HERE" | wrangler secret put RAPIDAPI_KEY

# OpenAI (skip if already set in Phase 10C)
echo "sk-proj-YOUR_KEY_HERE" | wrangler secret put OPENAI_API_KEY

Verify secrets:

wrangler secret list

๐Ÿš€ Step 5: Deploy (1 min)

npm run deploy

โœ… Test It

Option A: Using Frontend

  1. Go to your deployed web app
  2. Click VoxDub in sidebar
  3. Paste YouTube URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
  4. Select target language (e.g., "Spanish")
  5. Click Start Dubbing
  6. Watch progress bar (takes 2-5 minutes for 3-minute video)

Option B: Using API

# Get JWT token first (login via frontend and copy from localStorage)
export JWT_TOKEN="your_jwt_token_here"

# Create dubbing job
curl -X POST https://your-api.workers.dev/api/voxdub/create \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "targetLanguage": "Spanish"
  }'

# Response:
# {
#   "id": "job_1234567890_abc123",
#   "status": "queued",
#   "message": "YouTube dubbing job started",
#   "metadata": {
#     "title": "Never Gonna Give You Up",
#     "duration": 212,
#     "channelTitle": "Rick Astley",
#     ...
#   }
# }

# Check job status
curl https://your-api.workers.dev/api/jobs/job_1234567890_abc123 \
  -H "Authorization: Bearer $JWT_TOKEN"

๐Ÿ” Monitor Logs

Watch real-time processing:

wrangler tail --format pretty

Look for:

[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[Whisper] Transcription complete: 42 segments
[Gemini] Translation complete
[TTS] Generated segment 1/42
...
[VoxDub] Job completed

๐Ÿ“Š What Happens Behind the Scenes

User submits YouTube URL
         โ†“
[1] YouTube Data API โ†’ Get video metadata (title, duration)
         โ†“
[2] RapidAPI โ†’ Download audio โ†’ Store in R2
         โ†“
[3] OpenAI Whisper โ†’ Transcribe โ†’ 42 segments with timestamps
         โ†“
[4] Gemini AI โ†’ Translate each segment to target language
         โ†“
[5] Multi-TTS Router โ†’ Generate audio for each segment
         โ†“
[6] Save manifest โ†’ Job complete โ†’ User downloads

Total Time: 2-5 minutes for 3-minute video


๐Ÿ’ฐ Cost Breakdown (Example: 10-min video)

Service Cost
YouTube Data API FREE
RapidAPI audio download ~$0.01
Whisper transcription $0.06
Gemini translation FREE
TTS generation 225-450 credits

Total per video: ~$0.07 + credits


๐Ÿ› ๏ธ Development Mode (No API Keys)

For testing UI/flows without costs:

cd apps/api
npm run dev
# Don't add .dev.vars file
# Services will use mock data

โ— Troubleshooting

"Unable to access YouTube video"

  • Ensure video is public (not private/unlisted)
  • Check YouTube API quota (10k units/day free)

"RapidAPI failed: 403"

"Whisper API failed: 401"

Still seeing mock data in production?

# Verify secrets exist
wrangler secret list

# Re-add if missing
wrangler secret put YOUTUBE_API_KEY
wrangler secret put RAPIDAPI_KEY
wrangler secret put OPENAI_API_KEY

# Re-deploy
npm run deploy

๐Ÿ“š Full Documentation

  • Setup Guide: docs/YOUTUBE_INTEGRATION.md (300+ lines)
  • Implementation Summary: docs/YOUTUBE_PRODUCTION_SUMMARY.md
  • AI Agent Instructions: .github/copilot-instructions.md

๐ŸŽ‰ Success Checklist

  • YouTube Data API key working (see real video titles in logs)
  • RapidAPI downloading audio (see R2 storage paths in logs)
  • Whisper transcribing (see segment counts in logs)
  • Gemini translating (see translated text in manifest)
  • TTS generating (see audio files in R2)
  • Job completes successfully (status = "completed")
  • Can download dubbed manifest from R2

๐Ÿ” Security Notes

  1. Never commit API keys to git
  2. Use wrangler secret (not wrangler.toml [vars]) for production
  3. For local dev, use .dev.vars (already in .gitignore)
  4. Add rate limiting (see docs/YOUTUBE_INTEGRATION.md security section)
  5. Set video duration limits to control costs

๐Ÿšฆ Next Steps After Setup

  1. Test with various YouTube videos (short first)
  2. Monitor costs in dashboards:
  3. Add error notifications (already configured via Resend/Twilio)
  4. Implement duration limits (recommend max 1 hour)
  5. Add credit deduction before processing (already implemented)

๐Ÿ“ž Support


Estimated Setup Time: 15 minutes
First Video Processing Time: 2-5 minutes
Cost for Testing (5 videos): ~$0.35 + credits

๐ŸŽฏ You're ready to go! Start with a short YouTube video to test the full pipeline.

YouTube Integration
Extractor, download flow, and queue notes.
docs/YOUTUBE_INTEGRATION.md

YouTube Integration - Production Setup Guide

Overview

The YouTube dubbing feature (/api/voxdub) now uses real production APIs for:

  1. YouTube Data API v3 - Video metadata (title, duration, description, thumbnails)
  2. RapidAPI YouTube Downloader - Audio extraction from YouTube videos
  3. OpenAI Whisper - Speech-to-text transcription with timestamps

Required API Keys

1. YouTube Data API v3

Purpose: Fetch video metadata (title, duration, channel info, thumbnails)

Setup Steps:

  1. Go to Google Cloud Console
  2. Create a new project or select existing
  3. Enable YouTube Data API v3 in API Library
  4. Go to Credentials โ†’ Create Credentials โ†’ API Key
  5. Restrict the API key to YouTube Data API v3 (recommended for security)
  6. Copy the API key

Cost: Free tier includes 10,000 quota units/day (each metadata request = 1 unit)

Add to wrangler.toml:

YOUTUBE_API_KEY = "AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

2. RapidAPI - YouTube Downloader

Purpose: Extract audio stream URLs from YouTube videos

Setup Steps:

  1. Sign up at RapidAPI.com
  2. Subscribe to YouTube MP3 Downloader or similar API
  3. Go to Endpoints โ†’ Copy your X-RapidAPI-Key from code snippets
  4. Choose a plan (Free tier: 500 requests/month)

Alternative APIs (if preferred):

Add to wrangler.toml:

RAPIDAPI_KEY = "1234567890abcdefXXXXXXXXXXXXXXXXX"

Note: If using a different RapidAPI endpoint, update the API URL in apps/api/src/services/youtube.ts:

const response = await fetch(
  `https://YOUR-API-HOST.p.rapidapi.com/endpoint?url=...`,
  {
    headers: {
      'X-RapidAPI-Key': this.rapidApiKey,
      'X-RapidAPI-Host': 'YOUR-API-HOST.p.rapidapi.com'
    }
  }
);

3. OpenAI Whisper API

Purpose: Transcribe YouTube audio to text with timestamps

Setup Steps:

  1. Create account at OpenAI Platform
  2. Add payment method (Whisper pricing: $0.006/minute)
  3. Go to API Keys
  4. Create new secret key โ†’ Copy it

Cost: $0.006 per minute of audio (e.g., 10-minute video = $0.06)

Add to wrangler.toml:

OPENAI_API_KEY = "sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Note: This key is already configured for OpenAI TTS in Phase 10C. Same key works for Whisper.


Architecture Flow

User uploads YouTube URL
         โ†“
[YouTube Data API] โ†’ Get metadata (title, duration)
         โ†“
Job queued โ†’ Queue handler starts
         โ†“
[RapidAPI] โ†’ Extract audio โ†’ Download to R2
         โ†“
[OpenAI Whisper] โ†’ Transcribe audio โ†’ Segments with timestamps
         โ†“
[Gemini AI] โ†’ Translate segments to target language
         โ†“
[Multi-TTS Router] โ†’ Generate dubbed audio for each segment
         โ†“
Save manifest to R2 โ†’ Job completed

Code Implementation

Service Layer (apps/api/src/services/youtube.ts)

YouTubeService Class

  • getVideoMetadata(videoId) - Uses YouTube Data API v3
  • getAudioStreamUrl(videoId) - Uses RapidAPI
  • downloadAudioToR2(videoId, r2Bucket, tenantId, userId) - Downloads and stores audio

WhisperService Class

  • transcribe(audioUrl) - Sends audio to OpenAI Whisper API
  • formatForDubbing(segments) - Converts Whisper output to dubbing manifest format

Queue Handler (apps/api/src/queue-handler.ts)

processYoutubeDubJob() workflow:

  1. Download audio (5-20%) - RapidAPI + R2 storage
  2. Transcribe (20-40%) - OpenAI Whisper
  3. Translate (40-50%) - Gemini AI
  4. Generate TTS (50-90%) - Multi-TTS Router
  5. Save manifest (90-100%) - R2 storage

API Route (apps/api/src/routes/voxdub.ts)

POST /api/voxdub/create

  • Validates YouTube URL
  • Fetches metadata (fails if video inaccessible)
  • Creates job in D1
  • Queues background processing

Testing

Without API Keys (Fallback Mode)

If API keys are missing, services gracefully degrade:

  • YouTube Data API: Returns mock metadata
  • RapidAPI: Returns sample audio file
  • Whisper: Uses hardcoded transcript

With API Keys (Production Mode)

  1. Test YouTube Data API:
curl "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id=dQw4w9WgXcQ&key=YOUR_KEY"
  1. Test RapidAPI:
curl -X GET \
  'https://youtube-mp3-downloader2.p.rapidapi.com/ytmp3/ytmp3/custom/?url=https://www.youtube.com/watch?v=dQw4w9WgXcQ&quality=320' \
  -H 'X-RapidAPI-Key: YOUR_KEY' \
  -H 'X-RapidAPI-Host: youtube-mp3-downloader2.p.rapidapi.com'
  1. Test OpenAI Whisper:
curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/audio.mp3" \
  -F model="whisper-1"

Full Integration Test

// POST to your API
const response = await fetch('http://localhost:8787/api/voxdub/create', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_JWT',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    youtubeUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
    targetLanguage: 'Spanish'
  })
});

const job = await response.json();
console.log('Job ID:', job.id);

// Poll job status
const status = await fetch(`http://localhost:8787/api/jobs/${job.id}`, {
  headers: { 'Authorization': 'Bearer YOUR_JWT' }
});

Deployment

  1. Add secrets to Cloudflare:
cd apps/api

# YouTube Data API
echo "YOUR_KEY" | wrangler secret put YOUTUBE_API_KEY

# RapidAPI
echo "YOUR_KEY" | wrangler secret put RAPIDAPI_KEY

# OpenAI (if not already set)
echo "YOUR_KEY" | wrangler secret put OPENAI_API_KEY
  1. Deploy:
npm run deploy
  1. Verify secrets:
wrangler secret list

Error Handling

Common Issues

YouTube Data API Errors:

  • 403 Forbidden โ†’ API key not enabled or quota exceeded
  • 404 Not Found โ†’ Video doesn't exist or is private
  • 400 Bad Request โ†’ Invalid video ID format

RapidAPI Errors:

  • 403 Forbidden โ†’ Invalid API key or subscription expired
  • 429 Too Many Requests โ†’ Rate limit exceeded
  • 500 Server Error โ†’ Video unavailable or service down

Whisper API Errors:

  • 401 Unauthorized โ†’ Invalid API key
  • 400 Bad Request โ†’ Audio file too large (max 25 MB)
  • 429 Rate Limit โ†’ Exceeded usage limits

Monitoring

Check logs in Cloudflare Dashboard:

wrangler tail --format pretty

Look for:

  • [YouTube] Fetching metadata for...
  • [YouTube] Getting audio URL for...
  • [YouTube] Audio stored at...
  • [Whisper] Transcription failed: (if errors)

yt-dlp / Tunnel Checklist

Before resuming YouTube dubbing in environments that use the self-hosted yt-dlp API, verify:

  • Docker container is running: docker ps (look for voxdub-audio-yt-dlp-api).
  • Cloudflare Tunnel is running: confirm the current URL, or restart with cloudflared tunnel --url http://localhost:3034 (or your yt-dlp port).
  • If the tunnel URL changes, update both YTDLP_API_URL and YTDLP_SERVICE_URL in apps/api/wrangler.toml, then redeploy the API worker.

Cost Estimation

Example: 10-minute YouTube video dubbed to Spanish

Service Usage Cost
YouTube Data API 1 metadata request Free (within quota)
RapidAPI 1 download request $0.01 (varies by plan)
OpenAI Whisper 10 minutes transcription $0.06
Multi-TTS ~150 words ร— 1.5 sec/word = 225 sec 225-450 credits
Total API Cost ~$0.07 + credits

Security Best Practices

  1. Restrict API Keys:

    • YouTube: Limit to YouTube Data API v3 only
    • OpenAI: Set usage limits in dashboard
  2. Use Cloudflare Secrets (not wrangler.toml [vars]):

wrangler secret put YOUTUBE_API_KEY
  1. Rate Limiting: Add to voxdub.ts:
// Check user job count in last hour
const recentJobs = await c.env.DB.prepare(`
  SELECT COUNT(*) as count FROM jobs 
  WHERE user_id = ? AND created_at > datetime('now', '-1 hour')
`).bind(user.sub).first();

if (recentJobs.count >= 10) {
  return c.json({ error: 'Rate limit exceeded' }, 429);
}
  1. Validate Video Duration:
if (metadata.duration > 3600) { // 1 hour max
  return c.json({ error: 'Video too long (max 1 hour)' }, 400);
}

Next Steps

  1. Add to .github/copilot-instructions.md (already updated)
  2. Configure API keys in Cloudflare Dashboard
  3. Test with sample YouTube videos
  4. Monitor usage and costs in respective dashboards
  5. Consider adding video duration limits for cost control

Support

YouTube Production Summary
Deployment checklist and operational notes.
docs/YOUTUBE_PRODUCTION_SUMMARY.md

YouTube Dubbing - Production Implementation Summary

What Was Implemented

1. Enhanced YouTube Service (apps/api/src/services/youtube.ts)

New Features:

  • โœ… Production YouTube Data API v3 integration for metadata
  • โœ… RapidAPI integration for audio extraction
  • โœ… OpenAI Whisper integration for transcription
  • โœ… R2 audio storage with tenant isolation
  • โœ… ISO 8601 duration parsing (YouTube format โ†’ seconds)
  • โœ… Graceful fallback to mock data when API keys missing

Key Methods:

// YouTubeService
constructor(config?: { youtubeApiKey?: string, rapidApiKey?: string })
getVideoMetadata(videoId) // Real YouTube Data API v3
getAudioStreamUrl(videoId) // Real RapidAPI downloader
downloadAudioToR2(videoId, r2Bucket, tenantId, userId) // Store audio in R2

// WhisperService (new class)
constructor(apiKey: string)
transcribe(audioUrl) // OpenAI Whisper with timestamps
formatForDubbing(segments) // Convert to dubbing manifest format

Metadata Response:

{
  title: string;
  duration: number; // in seconds
  description: string;
  channelTitle?: string;
  thumbnailUrl?: string;
}

Whisper Transcription Response:

{
  text: string; // Full transcript
  segments: Array<{
    start: number; // seconds
    end: number;   // seconds
    text: string;  // segment text
  }>;
}

2. Updated Queue Handler (apps/api/src/queue-handler.ts)

Changes in processYoutubeDubJob():

Before (mocked):

const mockTranscript = "Hello, welcome...";
await youtube.getAudioStreamUrl(videoId); // Dummy validation

After (production):

// 1. Download real audio to R2
const audioR2Key = await youtube.downloadAudioToR2(videoId, env.UPLOADS, tenantId, userId);

// 2. Transcribe with Whisper
const audioUrl = await youtube.getAudioStreamUrl(videoId);
const { text, segments } = await whisper.transcribe(audioUrl);

// 3. Format segments with timestamps
const dubbingSegments = whisper.formatForDubbing(segments);

// 4. Translate with Gemini
const manifest = await gemini.generateStructuredDubbingScript(text, targetLanguage);

// 5. Merge Whisper timestamps with Gemini translations
manifest = manifest.map((seg, idx) => ({
  ...seg,
  start: segments[idx]?.start || seg.start,
  end: segments[idx]?.end || seg.end
}));

Progress Tracking:

  • 5-20%: Download audio to R2
  • 20-40%: Transcribe with Whisper
  • 40-50%: Translate with Gemini
  • 50-90%: Generate TTS for each segment
  • 90-100%: Save manifest

3. Enhanced VoxDub Route (apps/api/src/routes/voxdub.ts)

Changes:

  • Added YOUTUBE_API_KEY and RAPIDAPI_KEY to Bindings type
  • Initialize YouTubeService with config from environment
  • Better error handling: now fails early if video metadata can't be fetched
  • Returns detailed error messages for API failures

Before:

const ytService = new YouTubeService();
try {
  metadata = await ytService.getVideoMetadata(videoId);
} catch (e) {
  console.warn("Failed, continuing anyway...", e);
}

After:

const ytService = new YouTubeService({
  youtubeApiKey: c.env.YOUTUBE_API_KEY,
  rapidApiKey: c.env.RAPIDAPI_KEY
});

try {
  metadata = await ytService.getVideoMetadata(videoId);
} catch (e: any) {
  return c.json({ 
    error: 'Unable to access YouTube video. Check if video is public and API keys are configured.',
    details: e.message 
  }, 400);
}

4. Configuration Updates (apps/api/wrangler.toml)

Added Environment Variables:

# YouTube Integration (Production)
YOUTUBE_API_KEY = ""      # YouTube Data API v3 key from Google Cloud Console
RAPIDAPI_KEY = ""         # RapidAPI key for YouTube audio extraction

Note: OPENAI_API_KEY already existed from Phase 10C (used for both TTS and Whisper)


5. Documentation (docs/YOUTUBE_INTEGRATION.md)

New 300+ line guide covering:

  • Overview of architecture flow
  • Step-by-step API key setup for all 3 services
  • Cost estimation ($0.07 per 10-minute video + credits)
  • Code implementation details
  • Testing procedures (with and without API keys)
  • Deployment instructions using Cloudflare secrets
  • Error handling for common issues
  • Security best practices (rate limiting, duration validation)
  • Monitoring and logging tips

6. Updated Copilot Instructions (.github/copilot-instructions.md)

Added Section:

  • YouTube Integration overview
  • Service method descriptions
  • Required API keys
  • Graceful fallback behavior
  • Job processing flow breakdown
  • Reference to detailed docs

API Keys Setup Quick Reference

1. YouTube Data API v3

# Get key from: https://console.cloud.google.com/
wrangler secret put YOUTUBE_API_KEY
# Enter: AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXX

2. RapidAPI

# Get key from: https://rapidapi.com/
wrangler secret put RAPIDAPI_KEY
# Enter: 1234567890abcdefXXXXXXXXXXXX

3. OpenAI (Whisper)

# Get key from: https://platform.openai.com/
wrangler secret put OPENAI_API_KEY
# Enter: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXX

How to Test

Without API Keys (Development Mode)

cd apps/api
npm run dev
  • Services fall back to mock data
  • Allows UI/flow testing without costs

With API Keys (Production Mode)

  1. Add keys to local dev:
# Create .dev.vars file
cat > .dev.vars << EOF
YOUTUBE_API_KEY=AIzaSy...
RAPIDAPI_KEY=123456...
OPENAI_API_KEY=sk-proj...
EOF
  1. Start dev server:
npm run dev
  1. Test with real YouTube video:
curl -X POST http://localhost:8787/api/voxdub/create \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "targetLanguage": "Spanish"
  }'
  1. Check job progress:
# Get job ID from previous response
curl http://localhost:8787/api/jobs/job_XXXXX \
  -H "Authorization: Bearer YOUR_JWT"

What Changed from Mock to Production

Feature Before (Mock) After (Production)
Video Metadata Hardcoded title/duration Real YouTube Data API v3
Audio Extraction Dummy WAV file URL RapidAPI audio download + R2 storage
Transcription "Hello, welcome to this video..." OpenAI Whisper with real timestamps
Error Handling Warnings, continues anyway Fails early with detailed errors
API Keys None required 3 keys required (graceful fallback)
Timestamp Accuracy Generic 5-second segments Real segment boundaries from Whisper
Speaker Detection Single "Host" speaker Automatic grouping by ~10 segments
Cost $0 ~$0.07 per 10-min video + TTS credits

Files Modified

  1. apps/api/src/services/youtube.ts (200+ lines added)

    • YouTubeService constructor with config
    • Real YouTube Data API v3 integration
    • RapidAPI audio extraction
    • R2 download method
    • WhisperService class
    • Graceful fallback logic
  2. apps/api/src/queue-handler.ts (60 lines modified)

    • processYoutubeDubJob() rewritten
    • Real audio download
    • Whisper transcription
    • Timestamp merging
    • Progress tracking updated
  3. apps/api/src/routes/voxdub.ts (10 lines modified)

    • Added API key bindings
    • YouTubeService config initialization
    • Better error handling
  4. apps/api/wrangler.toml (3 lines added)

    • YOUTUBE_API_KEY var
    • RAPIDAPI_KEY var
    • Comments with setup info
  5. docs/YOUTUBE_INTEGRATION.md (new file, 300+ lines)

    • Complete setup guide
    • API key instructions
    • Cost breakdown
    • Testing procedures
  6. .github/copilot-instructions.md (30 lines added)

    • YouTube Integration section
    • Service descriptions
    • Flow overview

Cost Analysis

Per 10-Minute Video

Service Usage Cost
YouTube Data API 1 metadata request Free (10k/day quota)
RapidAPI 1 audio download ~$0.01
OpenAI Whisper 10 minutes $0.06
Gemini Translation ~500 words Free (under quota)
TTS Generation ~225 seconds 225-450 credits*

Total External Cost: ~$0.07 per video Total Credits: 225-450 (depending on quality tier)

* Credits: Premium=2/sec, Fast=1/sec, Cheap=0.5/sec

Monthly Estimate (100 videos)

  • API costs: ~$7
  • Credits consumed: 22,500-45,000
  • Average video length: 10 minutes

Next Steps

  1. โœ… Code Implementation - Complete
  2. โœ… Documentation - Complete
  3. โณ API Key Setup - Pending (requires Google Cloud, RapidAPI, OpenAI accounts)
  4. โณ Testing - Pending (test with real videos once keys configured)
  5. โณ Deployment - Pending (add secrets to Cloudflare, deploy)
  6. โณ Monitoring - Pending (watch logs for errors, track costs)

Deployment Checklist

  • Create Google Cloud project
  • Enable YouTube Data API v3
  • Generate YouTube API key
  • Sign up for RapidAPI
  • Subscribe to YouTube downloader API
  • Get RapidAPI key
  • Create OpenAI account (if not exists)
  • Add payment method to OpenAI
  • Generate OpenAI API key
  • Add all 3 secrets to Cloudflare:
    wrangler secret put YOUTUBE_API_KEY
    wrangler secret put RAPIDAPI_KEY
    wrangler secret put OPENAI_API_KEY
    
  • Deploy API: cd apps/api && npm run deploy
  • Test with real YouTube URL
  • Monitor logs: wrangler tail --format pretty
  • Check costs in dashboards (Google Cloud, RapidAPI, OpenAI)

Troubleshooting

"Unable to access YouTube video"

  • Check if video is public (not private/unlisted)
  • Verify YouTube API key is valid
  • Check quota in Google Cloud Console (10k units/day)

"RapidAPI failed: 403"

  • Verify RapidAPI key is correct
  • Check subscription status (Free tier = 500 req/month)
  • Ensure API host header matches your chosen API

"Whisper API failed"

  • Check OpenAI API key is valid
  • Verify billing is set up (Whisper requires payment)
  • Audio file must be < 25 MB (Whisper limit)

Mock data still appearing

  • Ensure .dev.vars file exists for local dev
  • For production, verify secrets with wrangler secret list
  • Check logs for "[YouTube] No API key, using mock metadata"

Success Indicators

When working correctly, you should see logs like:

[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[YouTube] Transcribing audio
[Whisper] Transcription complete: 42 segments
[YouTube] Translating to Spanish
[YouTube] Generating dubbed audio
[VoxDub] Video metadata: { title: "Never Gonna Give You Up", duration: 212, ... }

No errors, no warnings, real data flowing through pipeline.


Summary

The YouTube dubbing feature is now production-ready with:

  • Real YouTube Data API v3 for metadata
  • Real audio extraction via RapidAPI
  • Real transcription via OpenAI Whisper
  • Proper error handling and validation
  • Graceful fallback for development
  • Comprehensive documentation
  • Clear deployment path

Status: โœ… Implementation Complete | โณ API Key Setup Required

Tip: You can keep these docs private and still read them in-app.