Documentation

These docs live in the repo. Open the files below to edit or share with the team.

Dashboard Chat Studio

Core Docs

Getting Started

Fast local setup and verification checklist.

GETTING_STARTED.md

GETTING_STARTED.md - TTS Megasystem Quick Start

This is the shortest path to a working local setup. For full details, see SETUP.md and ARCHITECTURE.md.

1) Install

git clone <your-repo-url>
cd TTS-Premium
npm install

2) Initialize local D1 (first time only)

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

3) Start services

# From repo root (API + React editor)
npm run dev:clean

# In another terminal (SvelteKit shell)
cd apps/web-svelte
npm run dev

4) Open the app

SvelteKit shell: http://localhost:5175 (or next available port)
React editor: http://localhost:3001
API: http://localhost:8787

5) Configure env (if not already)

cd apps/web-svelte
cp .env.example .env
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001

cd apps/api
cp .dev.vars.example .dev.vars
# Add required keys for AI providers as needed.

6) What to verify

/login loads and signs in.
/home shows projects (or empty state).
"Open in Editor" sends you to /editor/:id in the React app.

Troubleshooting (fast fixes)

Ports in use:

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Wrangler login fails:

rm -rf ~/.wrangler
npm run wrangler login

Missing D1 tables:

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

Setup Guide

Full install, Cloudflare resources, and dev workflow.

SETUP.md

SETUP.md - TTS Megasystem Setup Guide

📋 Prerequisites Checklist

Node.js 20+ installed
npm installed (bundled with Node.js)
Cloudflare account created
Git installed
Docker installed (for extractor + XTTS wrapper)

🎯 Step-by-Step Setup

Clone & Install

git clone <your-repo-url>
cd TTS-Premium
npm install

Configure Cloudflare (API)

cd apps/api
npm run wrangler login

# Create D1 database
npm run wrangler d1 create tts-megasystem-db
# Copy database_id from output

# Create R2 bucket
npm run wrangler r2 bucket create tts-megasystem-assets

Update apps/api/wrangler.toml

Replace database_id under [[d1_databases]]
Replace bucket_name under [[r2_buckets]]

Initialize Database (D1)

cd apps/api

# Use the shared schema
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

Environment Variables

cd apps/api
cp .dev.vars.example .dev.vars   # if provided
# Otherwise create .dev.vars and add keys from your environment

cd apps/web-svelte
cp .env.example .env             # if provided
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001

Start Development

# From repo root (starts API + React editor)
npm run dev:clean

# Start SvelteKit shell
cd apps/web-svelte
npm run dev

Expected ports:

SvelteKit shell: http://localhost:5175 (or next available)
React editor: http://localhost:3001
API: http://localhost:8787

Verify Setup

Shell loads at /home after login
Projects list loads
Open in Editor redirects to React editor
API responds at http://localhost:8787

🔧 Common Issues

Port already in use

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Wrangler login fails

rm -rf ~/.wrangler
npm run wrangler login

D1 database not found

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

🚀 Next Steps

Auth hardening (cookie-based, cross-app)
Project model expansion (assets, timeline, voice settings)
Billing gates (Free/Pro/Premium)

Setup Time: ~30 minutes • Difficulty: Intermediate

Architecture

Hybrid SvelteKit + React editor + Workers design.

ARCHITECTURE.md

ARCHITECTURE.md - TTS Megasystem Architecture

🏛️ Overview The platform uses a hybrid frontend (SvelteKit shell + React editor) with a Cloudflare Workers API. This keeps global browsing fast while preserving a rich editor experience.

🎯 Why Hybrid

Shell (SvelteKit): fast navigation, auth, dashboard, settings.
Editor (React): complex audio/video UX, timeline, waveform, previews.

📊 Architecture Diagram ┌─────────────────────────────────────────────────────────┐ │ USER EXPERIENCE │ ├─────────────────────────────────────────────────────────┤ │ │ │ Shell (SvelteKit) │ │ /home /chat /settings /library │ │ │ │ ↓ Open project │ │ │ │ Editor (React) │ │ /editor/:projectId │ │ │ └────────────────────┬────────────────────────────────────┘ │ API calls ▼ ┌─────────────────────────────────────────────────────────┐ │ CLOUDFLARE WORKERS API (Hono) │ ├─────────────────────────────────────────────────────────┤ │ /api/auth /api/projects /api/files │ │ /api/tts /api/voxdub /api/asr │ │ /api/preview /api/translate /api/jobs │ └──────┬──────────────┬──────────────┬────────────────────┘ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ D1 │ │ R2 │ │ Queues │ │ (SQL DB) │ │ (Assets) │ │ (Jobs) │ └──────────┘ └──────────┘ └──────────┘

🔄 Core Request Flows

Project list (Shell) Shell → /api/projects → D1 → JSON → Shell renders
Open editor Shell redirects to /editor/:id → React loads → /api/projects/:id
TTS Editor → /api/tts/generate → Worker → Provider → R2 → response
Video dub Editor → /api/voxdub/create → Queue → Workers → R2 → preview

🗄️ Storage Strategy

D1: structured metadata (projects, jobs, users).
R2: large assets (audio, images, video, manifests).
Queues: async processing for dubbing/preview render.

📘 API Contract The authoritative endpoint and schema reference lives in API_CONTRACT.md. Keep SvelteKit shell and React editor aligned with it to avoid drift.

🔐 Auth Token-based auth via Workers API. Client stores token in localStorage and sends it as Bearer. (Can be upgraded to cookie-based later.)

🌐 Routing & Apps

Shell (SvelteKit): /home, /chat, /login, /signup, /settings.
Editor (React): /editor/:projectId (same domain in prod, via routing or redirect).
Shell redirects to editor using VITE_EDITOR_URL in dev.

🧩 Services (Local Dev)

API: apps/api (Workers) → http://localhost:8787
Shell: apps/web-svelte → http://localhost:517x
Editor: apps/web (React) → http://localhost:3001
Extractor: Docker on http://localhost:3100
yt-dlp service: http://localhost:3034 (merge/preview pipeline)
LiveKit: http://localhost:7880

📚 Data Model (Current) projects table (D1):

id, user_id, tenant_id, title, status, created_at, updated_at

Project metadata lives in D1. Heavy assets (audio/video/image) live in R2.

🧵 Background Jobs Queue workers handle:

youtube_dub, file_dub, url_dub (segment TTS + manifest)
Preview render (merge video + audio segments)

🔒 Security Notes

API expects Authorization: Bearer <token>.
CORS allowlist controlled by ALLOWED_ORIGINS.
Upgrade path: httpOnly cookies + CSRF protection for cross-app auth.

🚀 Deployment

Shell (SvelteKit): Cloudflare Pages
Editor (React): Cloudflare Pages
API: Cloudflare Workers

Architecture Version: 1.0 Last Updated: December 2025

Commands

Common scripts, ports, and quick fixes.

COMMANDS.md

COMMANDS.md - TTS Megasystem Command Reference

Short, common commands for local development. See SETUP.md for full details.

Install

npm install

Local development

# Start API + React editor
npm run dev:clean

# Start SvelteKit shell (separate terminal)
cd apps/web-svelte
npm run dev

API (Workers)

cd apps/api
npm run dev

React editor (legacy web)

cd apps/web
npm run dev

SvelteKit shell

cd apps/web-svelte
npm run dev

D1 schema (local)

cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql

Wrangler login

cd apps/api
npm run wrangler login

Ports (default)

API: http://localhost:8787
React editor: http://localhost:3001
SvelteKit shell: http://localhost:5175 (or next available)

Kill ports quickly

lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9

Extractor (docker)

docker-compose up -d --build extractor
docker-compose restart extractor

API Contract

Shared API endpoints, schemas, and error format.

API_CONTRACT.md

API Contract - TTS Hybrid MVP

KEEPS SVELTEKIT SHELL + REACT EDITOR IN SYNC. NO DRIFT ALLOWED.

Core Endpoints (Cloudflare Workers)

1. Script Generation

POST /api/chat-script
Content-Type: application/json

REQUEST:
{
  "prompt": "Create 5-slide faceless video: AI productivity tips"
}

RESPONSE: 200
{
  "slides": [
    {
      "id": "slide-1",
      "title": "Hook",
      "text": "Struggling with focus?",
      "image_prompt": "distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9"
    },
    {
      "id": "slide-2",
      "title": "Solution 1",
      "text": "Pomodoro: 25min work, 5min break",
      "image_prompt": "clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
    }
  ]
}

2. Slide Generation

POST /api/generate-slides
REQUEST:
{
  "slide_prompts": [
    "distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9",
    "clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
  ]
}

RESPONSE: 200
{
  "images": [
    "https://r2.dev/slide-1.jpg",
    "https://r2.dev/slide-2.jpg"
  ]
}

3. Voice Clone + Render

POST /api/render-video
REQUEST:
{
  "projectId": "proj_123",
  "slides": [...],
  "voice_sample_url": "https://r2.dev/voice.wav",
  "image_urls": [...]
}

RESPONSE: 202
{
  "render_id": "render_456",
  "status": "queued",
  "progress_url": "/api/render/456"
}

Error Format (All Endpoints)

400: { "error": "Invalid prompt", "code": "INVALID_INPUT" }
401: { "error": "Unauthorized", "code": "UNAUTH" }
429: { "error": "Rate limited", "code": "RATE_LIMIT" }

Shared Types (TypeScript)

interface Slide {
  id: string;
  title: string;
  text: string;
  image_prompt: string;
  image_url?: string;
}

Prompt Templates

Locked prompts for script and slide generation.

PROMPT_TEMPLATES.md

AI Prompt Templates - LOCKED

EXACT PROMPTS FOR CONSISTENT QUALITY. NO DEVIATIONS.

1. Groq Llama 3.1 8B - Script Generation

SYSTEM PROMPT:

You are a faceless YouTube script generator. ALWAYS return valid JSON.

Generate EXACTLY 5 slides for a 60-second video. Each slide: 8-12 seconds max.

Format: { "slides": [ { "id": "slide-1", "title": "Hook", "text": "Attention-grabbing question or stat (10 words max)", "image_prompt": "professional 16:9 slide description with EXACT TEXT overlay" } ] }

Rules:

Slide 1 ALWAYS hook/problem
Slides 2-4 ALWAYS solutions/tips
Slide 5 ALWAYS CTA
image_prompt MUST include "bold text 'EXACT TEXT' overlay, 16:9"


USER PROMPT TEMPLATE:

Create 5-slide faceless video: ${user_input}


EXPECTED OUTPUT:
```json
{
  "slides": [
    {
      "id": "slide-1",
      "title": "Hook",
      "text": "Struggling with focus?",
      "image_prompt": "distracted worker desk, bold white text 'FOCUS CRISIS' overlay, 16:9"
    }
  ]
}

2. Replicate Flux.1-schnell - Slide Images

PROMPT TEMPLATE:
"${slide_description}, professional presentation slide, EXACT TEXT '${slide.text}' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"

EXAMPLE:
"clean desk with pomodoro timer, professional presentation slide, EXACT TEXT '25min FOCUS' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"

Sources

YouTube Docs

YouTube Quickstart

Fast path to get YouTube dubbing running.

docs/YOUTUBE_QUICKSTART.md

Quick Start: YouTube Dubbing Setup

🎯 Goal

Enable production YouTube dubbing with real APIs in 15 minutes.

📋 Prerequisites

Google account (for YouTube API)
Credit card (for RapidAPI and OpenAI - minimal costs)
Cloudflare account with Workers deployed

🚀 Step 1: YouTube Data API (5 min)

Go to https://console.cloud.google.com/
Click Select a project → New Project
Name it "TTS Platform" → Create
In search bar, type "YouTube Data API v3" → Enable
Go to APIs & Services → Credentials
Click + Create Credentials → API Key
Copy the key (starts with AIzaSy...)
(Optional) Click Restrict Key → Allow only "YouTube Data API v3"

Cost: FREE (10,000 quota units/day)

🚀 Step 2: RapidAPI (3 min)

Sign up at https://rapidapi.com/ (use Google login)
Search for "YouTube MP3 Downloader" or go to: https://rapidapi.com/ytjar/api/youtube-mp3-downloader2/
Click Subscribe to Test
Choose Basic Plan (free 500 requests/month)
Go to Endpoints tab
Copy your X-RapidAPI-Key from code snippet (right side)

Cost: FREE tier (500 req/month), then $0.01/request

Alternative APIs (if preferred):

🚀 Step 3: OpenAI Whisper (3 min)

Go to https://platform.openai.com/
Sign up or log in
Click Settings → Billing → Add payment method
Go to API Keys → + Create new secret key
Name it "TTS Platform Whisper" → Copy key (starts with sk-proj-...)

Cost: $0.006 per minute of audio (10-min video = $0.06)

Note: If you already set up OpenAI for Phase 10C TTS, use the same key.

🚀 Step 4: Add Secrets to Cloudflare (2 min)

cd apps/api

# YouTube
echo "AIzaSy_YOUR_KEY_HERE" | wrangler secret put YOUTUBE_API_KEY

# RapidAPI
echo "YOUR_RAPIDAPI_KEY_HERE" | wrangler secret put RAPIDAPI_KEY

# OpenAI (skip if already set in Phase 10C)
echo "sk-proj-YOUR_KEY_HERE" | wrangler secret put OPENAI_API_KEY

Verify secrets:

wrangler secret list

🚀 Step 5: Deploy (1 min)

npm run deploy

✅ Test It

Option A: Using Frontend

Go to your deployed web app
Click VoxDub in sidebar
Paste YouTube URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Select target language (e.g., "Spanish")
Click Start Dubbing
Watch progress bar (takes 2-5 minutes for 3-minute video)

Option B: Using API

# Get JWT token first (login via frontend and copy from localStorage)
export JWT_TOKEN="your_jwt_token_here"

# Create dubbing job
curl -X POST https://your-api.workers.dev/api/voxdub/create \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "targetLanguage": "Spanish"
  }'

# Response:
# {
#   "id": "job_1234567890_abc123",
#   "status": "queued",
#   "message": "YouTube dubbing job started",
#   "metadata": {
#     "title": "Never Gonna Give You Up",
#     "duration": 212,
#     "channelTitle": "Rick Astley",
#     ...
#   }
# }

# Check job status
curl https://your-api.workers.dev/api/jobs/job_1234567890_abc123 \
  -H "Authorization: Bearer $JWT_TOKEN"

🔍 Monitor Logs

Watch real-time processing:

wrangler tail --format pretty

Look for:

[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[Whisper] Transcription complete: 42 segments
[Gemini] Translation complete
[TTS] Generated segment 1/42
...
[VoxDub] Job completed

📊 What Happens Behind the Scenes

User submits YouTube URL
         ↓
[1] YouTube Data API → Get video metadata (title, duration)
         ↓
[2] RapidAPI → Download audio → Store in R2
         ↓
[3] OpenAI Whisper → Transcribe → 42 segments with timestamps
         ↓
[4] Gemini AI → Translate each segment to target language
         ↓
[5] Multi-TTS Router → Generate audio for each segment
         ↓
[6] Save manifest → Job complete → User downloads

Total Time: 2-5 minutes for 3-minute video

💰 Cost Breakdown (Example: 10-min video)

Service	Cost
YouTube Data API	FREE
RapidAPI audio download	~$0.01
Whisper transcription	$0.06
Gemini translation	FREE
TTS generation	225-450 credits

Total per video: ~$0.07 + credits

🛠️ Development Mode (No API Keys)

For testing UI/flows without costs:

cd apps/api
npm run dev
# Don't add .dev.vars file
# Services will use mock data

❗ Troubleshooting

"Unable to access YouTube video"

Ensure video is public (not private/unlisted)
Check YouTube API quota (10k units/day free)

"RapidAPI failed: 403"

Verify subscription is active (check https://rapidapi.com/developer/billing)
Ensure you're within free tier limits (500/month)

"Whisper API failed: 401"

Verify OpenAI billing is set up
Check API key is valid: https://platform.openai.com/api-keys

Still seeing mock data in production?

# Verify secrets exist
wrangler secret list

# Re-add if missing
wrangler secret put YOUTUBE_API_KEY
wrangler secret put RAPIDAPI_KEY
wrangler secret put OPENAI_API_KEY

# Re-deploy
npm run deploy

📚 Full Documentation

Setup Guide: docs/YOUTUBE_INTEGRATION.md (300+ lines)
Implementation Summary: docs/YOUTUBE_PRODUCTION_SUMMARY.md
AI Agent Instructions: .github/copilot-instructions.md

🎉 Success Checklist

YouTube Data API key working (see real video titles in logs)
RapidAPI downloading audio (see R2 storage paths in logs)
Whisper transcribing (see segment counts in logs)
Gemini translating (see translated text in manifest)
TTS generating (see audio files in R2)
Job completes successfully (status = "completed")
Can download dubbed manifest from R2

🔐 Security Notes

Never commit API keys to git
Use wrangler secret (not wrangler.toml [vars]) for production
For local dev, use .dev.vars (already in .gitignore)
Add rate limiting (see docs/YOUTUBE_INTEGRATION.md security section)
Set video duration limits to control costs

🚦 Next Steps After Setup

Test with various YouTube videos (short first)
Monitor costs in dashboards:
- Google Cloud: https://console.cloud.google.com/billing
- RapidAPI: https://rapidapi.com/developer/billing
- OpenAI: https://platform.openai.com/usage
Add error notifications (already configured via Resend/Twilio)
Implement duration limits (recommend max 1 hour)
Add credit deduction before processing (already implemented)

📞 Support

YouTube API: https://developers.google.com/youtube/v3/docs
RapidAPI: https://docs.rapidapi.com/
OpenAI: https://platform.openai.com/docs/guides/speech-to-text
Cloudflare Workers: https://developers.cloudflare.com/workers/

Estimated Setup Time: 15 minutes
First Video Processing Time: 2-5 minutes
Cost for Testing (5 videos): ~$0.35 + credits

🎯 You're ready to go! Start with a short YouTube video to test the full pipeline.

YouTube Integration

Extractor, download flow, and queue notes.

docs/YOUTUBE_INTEGRATION.md

YouTube Integration - Production Setup Guide

Overview

The YouTube dubbing feature (/api/voxdub) now uses real production APIs for:

YouTube Data API v3 - Video metadata (title, duration, description, thumbnails)
RapidAPI YouTube Downloader - Audio extraction from YouTube videos
OpenAI Whisper - Speech-to-text transcription with timestamps

Required API Keys

1. YouTube Data API v3

Purpose: Fetch video metadata (title, duration, channel info, thumbnails)

Setup Steps:

Go to Google Cloud Console
Create a new project or select existing
Enable YouTube Data API v3 in API Library
Go to Credentials → Create Credentials → API Key
Restrict the API key to YouTube Data API v3 (recommended for security)
Copy the API key

Cost: Free tier includes 10,000 quota units/day (each metadata request = 1 unit)

Add to wrangler.toml:

YOUTUBE_API_KEY = "AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

2. RapidAPI - YouTube Downloader

Purpose: Extract audio stream URLs from YouTube videos

Setup Steps:

Sign up at RapidAPI.com
Subscribe to YouTube MP3 Downloader or similar API
Go to Endpoints → Copy your X-RapidAPI-Key from code snippets
Choose a plan (Free tier: 500 requests/month)

Alternative APIs (if preferred):

Add to wrangler.toml:

RAPIDAPI_KEY = "1234567890abcdefXXXXXXXXXXXXXXXXX"

Note: If using a different RapidAPI endpoint, update the API URL in apps/api/src/services/youtube.ts:

const response = await fetch(
  `https://YOUR-API-HOST.p.rapidapi.com/endpoint?url=...`,
  {
    headers: {
      'X-RapidAPI-Key': this.rapidApiKey,
      'X-RapidAPI-Host': 'YOUR-API-HOST.p.rapidapi.com'
    }
  }
);

3. OpenAI Whisper API

Purpose: Transcribe YouTube audio to text with timestamps

Setup Steps:

Create account at OpenAI Platform
Add payment method (Whisper pricing: $0.006/minute)
Go to API Keys
Create new secret key → Copy it

Cost: $0.006 per minute of audio (e.g., 10-minute video = $0.06)

Add to wrangler.toml:

OPENAI_API_KEY = "sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Note: This key is already configured for OpenAI TTS in Phase 10C. Same key works for Whisper.

Architecture Flow

User uploads YouTube URL
         ↓
[YouTube Data API] → Get metadata (title, duration)
         ↓
Job queued → Queue handler starts
         ↓
[RapidAPI] → Extract audio → Download to R2
         ↓
[OpenAI Whisper] → Transcribe audio → Segments with timestamps
         ↓
[Gemini AI] → Translate segments to target language
         ↓
[Multi-TTS Router] → Generate dubbed audio for each segment
         ↓
Save manifest to R2 → Job completed

Code Implementation

Service Layer (`apps/api/src/services/youtube.ts`)

YouTubeService Class

getVideoMetadata(videoId) - Uses YouTube Data API v3
getAudioStreamUrl(videoId) - Uses RapidAPI
downloadAudioToR2(videoId, r2Bucket, tenantId, userId) - Downloads and stores audio

WhisperService Class

transcribe(audioUrl) - Sends audio to OpenAI Whisper API
formatForDubbing(segments) - Converts Whisper output to dubbing manifest format

Queue Handler (`apps/api/src/queue-handler.ts`)

processYoutubeDubJob() workflow:

Download audio (5-20%) - RapidAPI + R2 storage
Transcribe (20-40%) - OpenAI Whisper
Translate (40-50%) - Gemini AI
Generate TTS (50-90%) - Multi-TTS Router
Save manifest (90-100%) - R2 storage

API Route (`apps/api/src/routes/voxdub.ts`)

POST /api/voxdub/create

Validates YouTube URL
Fetches metadata (fails if video inaccessible)
Creates job in D1
Queues background processing

Testing

Without API Keys (Fallback Mode)

If API keys are missing, services gracefully degrade:

YouTube Data API: Returns mock metadata
RapidAPI: Returns sample audio file
Whisper: Uses hardcoded transcript

With API Keys (Production Mode)

Test YouTube Data API:

curl "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id=dQw4w9WgXcQ&key=YOUR_KEY"

Test RapidAPI:

curl -X GET \
  'https://youtube-mp3-downloader2.p.rapidapi.com/ytmp3/ytmp3/custom/?url=https://www.youtube.com/watch?v=dQw4w9WgXcQ&quality=320' \
  -H 'X-RapidAPI-Key: YOUR_KEY' \
  -H 'X-RapidAPI-Host: youtube-mp3-downloader2.p.rapidapi.com'

Test OpenAI Whisper:

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/audio.mp3" \
  -F model="whisper-1"

Full Integration Test

// POST to your API
const response = await fetch('http://localhost:8787/api/voxdub/create', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_JWT',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    youtubeUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
    targetLanguage: 'Spanish'
  })
});

const job = await response.json();
console.log('Job ID:', job.id);

// Poll job status
const status = await fetch(`http://localhost:8787/api/jobs/${job.id}`, {
  headers: { 'Authorization': 'Bearer YOUR_JWT' }
});

Deployment

Add secrets to Cloudflare:

cd apps/api

# YouTube Data API
echo "YOUR_KEY" | wrangler secret put YOUTUBE_API_KEY

# RapidAPI
echo "YOUR_KEY" | wrangler secret put RAPIDAPI_KEY

# OpenAI (if not already set)
echo "YOUR_KEY" | wrangler secret put OPENAI_API_KEY

Deploy:

npm run deploy

Verify secrets:

wrangler secret list

Error Handling

Common Issues

YouTube Data API Errors:

403 Forbidden → API key not enabled or quota exceeded
404 Not Found → Video doesn't exist or is private
400 Bad Request → Invalid video ID format

RapidAPI Errors:

403 Forbidden → Invalid API key or subscription expired
429 Too Many Requests → Rate limit exceeded
500 Server Error → Video unavailable or service down

Whisper API Errors:

401 Unauthorized → Invalid API key
400 Bad Request → Audio file too large (max 25 MB)
429 Rate Limit → Exceeded usage limits

Monitoring

Check logs in Cloudflare Dashboard:

wrangler tail --format pretty

Look for:

[YouTube] Fetching metadata for...
[YouTube] Getting audio URL for...
[YouTube] Audio stored at...
[Whisper] Transcription failed: (if errors)

yt-dlp / Tunnel Checklist

Before resuming YouTube dubbing in environments that use the self-hosted yt-dlp API, verify:

Docker container is running: docker ps (look for voxdub-audio-yt-dlp-api).
Cloudflare Tunnel is running: confirm the current URL, or restart with cloudflared tunnel --url http://localhost:3034 (or your yt-dlp port).
If the tunnel URL changes, update both YTDLP_API_URL and YTDLP_SERVICE_URL in apps/api/wrangler.toml, then redeploy the API worker.

Cost Estimation

Example: 10-minute YouTube video dubbed to Spanish

Service	Usage	Cost
YouTube Data API	1 metadata request	Free (within quota)
RapidAPI	1 download request	$0.01 (varies by plan)
OpenAI Whisper	10 minutes transcription	$0.06
Multi-TTS	~150 words × 1.5 sec/word = 225 sec	225-450 credits
Total API Cost		~$0.07 + credits

Security Best Practices

Restrict API Keys:
- YouTube: Limit to YouTube Data API v3 only
- OpenAI: Set usage limits in dashboard
Use Cloudflare Secrets (not wrangler.toml [vars]):

wrangler secret put YOUTUBE_API_KEY

Rate Limiting: Add to voxdub.ts:

// Check user job count in last hour
const recentJobs = await c.env.DB.prepare(`
  SELECT COUNT(*) as count FROM jobs 
  WHERE user_id = ? AND created_at > datetime('now', '-1 hour')
`).bind(user.sub).first();

if (recentJobs.count >= 10) {
  return c.json({ error: 'Rate limit exceeded' }, 429);
}

Validate Video Duration:

if (metadata.duration > 3600) { // 1 hour max
  return c.json({ error: 'Video too long (max 1 hour)' }, 400);
}

Next Steps

Add to .github/copilot-instructions.md (already updated)
Configure API keys in Cloudflare Dashboard
Test with sample YouTube videos
Monitor usage and costs in respective dashboards
Consider adding video duration limits for cost control

Support

YouTube API: https://developers.google.com/youtube/v3
RapidAPI: https://rapidapi.com/hub
OpenAI Whisper: https://platform.openai.com/docs/guides/speech-to-text

YouTube Production Summary

Deployment checklist and operational notes.

docs/YOUTUBE_PRODUCTION_SUMMARY.md

YouTube Dubbing - Production Implementation Summary

What Was Implemented

1. Enhanced YouTube Service (`apps/api/src/services/youtube.ts`)

New Features:

✅ Production YouTube Data API v3 integration for metadata
✅ RapidAPI integration for audio extraction
✅ OpenAI Whisper integration for transcription
✅ R2 audio storage with tenant isolation
✅ ISO 8601 duration parsing (YouTube format → seconds)
✅ Graceful fallback to mock data when API keys missing

Key Methods:

// YouTubeService
constructor(config?: { youtubeApiKey?: string, rapidApiKey?: string })
getVideoMetadata(videoId) // Real YouTube Data API v3
getAudioStreamUrl(videoId) // Real RapidAPI downloader
downloadAudioToR2(videoId, r2Bucket, tenantId, userId) // Store audio in R2

// WhisperService (new class)
constructor(apiKey: string)
transcribe(audioUrl) // OpenAI Whisper with timestamps
formatForDubbing(segments) // Convert to dubbing manifest format

Metadata Response:

{
  title: string;
  duration: number; // in seconds
  description: string;
  channelTitle?: string;
  thumbnailUrl?: string;
}

Whisper Transcription Response:

{
  text: string; // Full transcript
  segments: Array<{
    start: number; // seconds
    end: number;   // seconds
    text: string;  // segment text
  }>;
}

2. Updated Queue Handler (`apps/api/src/queue-handler.ts`)

Changes in processYoutubeDubJob():

Before (mocked):

const mockTranscript = "Hello, welcome...";
await youtube.getAudioStreamUrl(videoId); // Dummy validation

After (production):

// 1. Download real audio to R2
const audioR2Key = await youtube.downloadAudioToR2(videoId, env.UPLOADS, tenantId, userId);

// 2. Transcribe with Whisper
const audioUrl = await youtube.getAudioStreamUrl(videoId);
const { text, segments } = await whisper.transcribe(audioUrl);

// 3. Format segments with timestamps
const dubbingSegments = whisper.formatForDubbing(segments);

// 4. Translate with Gemini
const manifest = await gemini.generateStructuredDubbingScript(text, targetLanguage);

// 5. Merge Whisper timestamps with Gemini translations
manifest = manifest.map((seg, idx) => ({
  ...seg,
  start: segments[idx]?.start || seg.start,
  end: segments[idx]?.end || seg.end
}));

Progress Tracking:

5-20%: Download audio to R2
20-40%: Transcribe with Whisper
40-50%: Translate with Gemini
50-90%: Generate TTS for each segment
90-100%: Save manifest

3. Enhanced VoxDub Route (`apps/api/src/routes/voxdub.ts`)

Changes:

Added YOUTUBE_API_KEY and RAPIDAPI_KEY to Bindings type
Initialize YouTubeService with config from environment
Better error handling: now fails early if video metadata can't be fetched
Returns detailed error messages for API failures

Before:

const ytService = new YouTubeService();
try {
  metadata = await ytService.getVideoMetadata(videoId);
} catch (e) {
  console.warn("Failed, continuing anyway...", e);
}

After:

const ytService = new YouTubeService({
  youtubeApiKey: c.env.YOUTUBE_API_KEY,
  rapidApiKey: c.env.RAPIDAPI_KEY
});

try {
  metadata = await ytService.getVideoMetadata(videoId);
} catch (e: any) {
  return c.json({ 
    error: 'Unable to access YouTube video. Check if video is public and API keys are configured.',
    details: e.message 
  }, 400);
}

4. Configuration Updates (`apps/api/wrangler.toml`)

Added Environment Variables:

# YouTube Integration (Production)
YOUTUBE_API_KEY = ""      # YouTube Data API v3 key from Google Cloud Console
RAPIDAPI_KEY = ""         # RapidAPI key for YouTube audio extraction

Note: OPENAI_API_KEY already existed from Phase 10C (used for both TTS and Whisper)

5. Documentation (`docs/YOUTUBE_INTEGRATION.md`)

New 300+ line guide covering:

Overview of architecture flow
Step-by-step API key setup for all 3 services
Cost estimation ($0.07 per 10-minute video + credits)
Code implementation details
Testing procedures (with and without API keys)
Deployment instructions using Cloudflare secrets
Error handling for common issues
Security best practices (rate limiting, duration validation)
Monitoring and logging tips

6. Updated Copilot Instructions (`.github/copilot-instructions.md`)

Added Section:

YouTube Integration overview
Service method descriptions
Required API keys
Graceful fallback behavior
Job processing flow breakdown
Reference to detailed docs

API Keys Setup Quick Reference

1. YouTube Data API v3

# Get key from: https://console.cloud.google.com/
wrangler secret put YOUTUBE_API_KEY
# Enter: AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXX

2. RapidAPI

# Get key from: https://rapidapi.com/
wrangler secret put RAPIDAPI_KEY
# Enter: 1234567890abcdefXXXXXXXXXXXX

3. OpenAI (Whisper)

# Get key from: https://platform.openai.com/
wrangler secret put OPENAI_API_KEY
# Enter: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXX

How to Test

Without API Keys (Development Mode)

cd apps/api
npm run dev

Services fall back to mock data
Allows UI/flow testing without costs

With API Keys (Production Mode)

Add keys to local dev:

# Create .dev.vars file
cat > .dev.vars << EOF
YOUTUBE_API_KEY=AIzaSy...
RAPIDAPI_KEY=123456...
OPENAI_API_KEY=sk-proj...
EOF

Start dev server:

npm run dev

Test with real YouTube video:

curl -X POST http://localhost:8787/api/voxdub/create \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "targetLanguage": "Spanish"
  }'

Check job progress:

# Get job ID from previous response
curl http://localhost:8787/api/jobs/job_XXXXX \
  -H "Authorization: Bearer YOUR_JWT"

What Changed from Mock to Production

Feature	Before (Mock)	After (Production)
Video Metadata	Hardcoded title/duration	Real YouTube Data API v3
Audio Extraction	Dummy WAV file URL	RapidAPI audio download + R2 storage
Transcription	`"Hello, welcome to this video..."`	OpenAI Whisper with real timestamps
Error Handling	Warnings, continues anyway	Fails early with detailed errors
API Keys	None required	3 keys required (graceful fallback)
Timestamp Accuracy	Generic 5-second segments	Real segment boundaries from Whisper
Speaker Detection	Single "Host" speaker	Automatic grouping by ~10 segments
Cost	$0	~$0.07 per 10-min video + TTS credits

Files Modified

apps/api/src/services/youtube.ts (200+ lines added)
- YouTubeService constructor with config
- Real YouTube Data API v3 integration
- RapidAPI audio extraction
- R2 download method
- WhisperService class
- Graceful fallback logic
apps/api/src/queue-handler.ts (60 lines modified)
- processYoutubeDubJob() rewritten
- Real audio download
- Whisper transcription
- Timestamp merging
- Progress tracking updated
apps/api/src/routes/voxdub.ts (10 lines modified)
- Added API key bindings
- YouTubeService config initialization
- Better error handling
apps/api/wrangler.toml (3 lines added)
- YOUTUBE_API_KEY var
- RAPIDAPI_KEY var
- Comments with setup info
docs/YOUTUBE_INTEGRATION.md (new file, 300+ lines)
- Complete setup guide
- API key instructions
- Cost breakdown
- Testing procedures
.github/copilot-instructions.md (30 lines added)
- YouTube Integration section
- Service descriptions
- Flow overview

Cost Analysis

Per 10-Minute Video

Service	Usage	Cost
YouTube Data API	1 metadata request	Free (10k/day quota)
RapidAPI	1 audio download	~$0.01
OpenAI Whisper	10 minutes	$0.06
Gemini Translation	~500 words	Free (under quota)
TTS Generation	~225 seconds	225-450 credits*

Total External Cost: ~$0.07 per video Total Credits: 225-450 (depending on quality tier)

* Credits: Premium=2/sec, Fast=1/sec, Cheap=0.5/sec

Monthly Estimate (100 videos)

API costs: ~$7
Credits consumed: 22,500-45,000
Average video length: 10 minutes

Next Steps

✅ Code Implementation - Complete
✅ Documentation - Complete
⏳ API Key Setup - Pending (requires Google Cloud, RapidAPI, OpenAI accounts)
⏳ Testing - Pending (test with real videos once keys configured)
⏳ Deployment - Pending (add secrets to Cloudflare, deploy)
⏳ Monitoring - Pending (watch logs for errors, track costs)

Deployment Checklist

Create Google Cloud project
Enable YouTube Data API v3
Generate YouTube API key
Sign up for RapidAPI
Subscribe to YouTube downloader API
Get RapidAPI key
Create OpenAI account (if not exists)
Add payment method to OpenAI
Generate OpenAI API key

Add all 3 secrets to Cloudflare:

wrangler secret put YOUTUBE_API_KEY
wrangler secret put RAPIDAPI_KEY
wrangler secret put OPENAI_API_KEY

Deploy API: cd apps/api && npm run deploy
Test with real YouTube URL
Monitor logs: wrangler tail --format pretty
Check costs in dashboards (Google Cloud, RapidAPI, OpenAI)

Troubleshooting

"Unable to access YouTube video"

Check if video is public (not private/unlisted)
Verify YouTube API key is valid
Check quota in Google Cloud Console (10k units/day)

"RapidAPI failed: 403"

Verify RapidAPI key is correct
Check subscription status (Free tier = 500 req/month)
Ensure API host header matches your chosen API

"Whisper API failed"

Check OpenAI API key is valid
Verify billing is set up (Whisper requires payment)
Audio file must be < 25 MB (Whisper limit)

Mock data still appearing

Ensure .dev.vars file exists for local dev
For production, verify secrets with wrangler secret list
Check logs for "[YouTube] No API key, using mock metadata"

Success Indicators

When working correctly, you should see logs like:

[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[YouTube] Transcribing audio
[Whisper] Transcription complete: 42 segments
[YouTube] Translating to Spanish
[YouTube] Generating dubbed audio
[VoxDub] Video metadata: { title: "Never Gonna Give You Up", duration: 212, ... }

No errors, no warnings, real data flowing through pipeline.

Summary

The YouTube dubbing feature is now production-ready with:

Real YouTube Data API v3 for metadata
Real audio extraction via RapidAPI
Real transcription via OpenAI Whisper
Proper error handling and validation
Graceful fallback for development
Comprehensive documentation
Clear deployment path

Status: ✅ Implementation Complete | ⏳ API Key Setup Required

Tip: You can keep these docs private and still read them in-app.

GETTING_STARTED.md - TTS Megasystem Quick Start

1) Install

2) Initialize local D1 (first time only)

3) Start services

4) Open the app

5) Configure env (if not already)

6) What to verify

Troubleshooting (fast fixes)

COMMANDS.md - TTS Megasystem Command Reference

Install

Local development

API (Workers)

React editor (legacy web)

SvelteKit shell

D1 schema (local)

Wrangler login

Ports (default)

Kill ports quickly

Extractor (docker)

API Contract - TTS Hybrid MVP

Core Endpoints (Cloudflare Workers)

1. Script Generation

2. Slide Generation

3. Voice Clone + Render

Error Format (All Endpoints)

Shared Types (TypeScript)

AI Prompt Templates - LOCKED

1. Groq Llama 3.1 8B - Script Generation

2. Replicate Flux.1-schnell - Slide Images

Quick Start: YouTube Dubbing Setup

🎯 Goal

📋 Prerequisites

🚀 Step 1: YouTube Data API (5 min)

🚀 Step 2: RapidAPI (3 min)

🚀 Step 3: OpenAI Whisper (3 min)

🚀 Step 4: Add Secrets to Cloudflare (2 min)

🚀 Step 5: Deploy (1 min)

✅ Test It

Option A: Using Frontend

Option B: Using API

🔍 Monitor Logs

📊 What Happens Behind the Scenes

💰 Cost Breakdown (Example: 10-min video)

🛠️ Development Mode (No API Keys)

❗ Troubleshooting

"Unable to access YouTube video"

"RapidAPI failed: 403"

"Whisper API failed: 401"

Still seeing mock data in production?

📚 Full Documentation

🎉 Success Checklist

🔐 Security Notes

🚦 Next Steps After Setup

📞 Support

YouTube Integration - Production Setup Guide

Overview

Required API Keys

1. YouTube Data API v3

2. RapidAPI - YouTube Downloader

3. OpenAI Whisper API

Architecture Flow

Code Implementation

Service Layer (apps/api/src/services/youtube.ts)

YouTubeService Class

WhisperService Class

Queue Handler (apps/api/src/queue-handler.ts)

API Route (apps/api/src/routes/voxdub.ts)

Testing

Without API Keys (Fallback Mode)

With API Keys (Production Mode)

Full Integration Test

Deployment

Error Handling

Common Issues

Monitoring

yt-dlp / Tunnel Checklist

Cost Estimation

Security Best Practices

Next Steps

Support

Service Layer (`apps/api/src/services/youtube.ts`)

Queue Handler (`apps/api/src/queue-handler.ts`)

API Route (`apps/api/src/routes/voxdub.ts`)

1. Enhanced YouTube Service (`apps/api/src/services/youtube.ts`)

2. Updated Queue Handler (`apps/api/src/queue-handler.ts`)

3. Enhanced VoxDub Route (`apps/api/src/routes/voxdub.ts`)

4. Configuration Updates (`apps/api/wrangler.toml`)

5. Documentation (`docs/YOUTUBE_INTEGRATION.md`)

6. Updated Copilot Instructions (`.github/copilot-instructions.md`)