These docs live in the repo. Open the files below to edit or share with the team.
This is the shortest path to a working local setup. For full details, see SETUP.md and ARCHITECTURE.md.
git clone <your-repo-url>
cd TTS-Premium
npm install
cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
# From repo root (API + React editor)
npm run dev:clean
# In another terminal (SvelteKit shell)
cd apps/web-svelte
npm run dev
http://localhost:5175 (or next available port)http://localhost:3001http://localhost:8787cd apps/web-svelte
cp .env.example .env
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001
cd apps/api
cp .dev.vars.example .dev.vars
# Add required keys for AI providers as needed.
/login loads and signs in./home shows projects (or empty state)./editor/:id in the React app.Ports in use:
lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9
Wrangler login fails:
rm -rf ~/.wrangler
npm run wrangler login
Missing D1 tables:
cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
SETUP.md - TTS Megasystem Setup Guide
๐ Prerequisites Checklist
๐ฏ Step-by-Step Setup
git clone <your-repo-url>
cd TTS-Premium
npm install
cd apps/api
npm run wrangler login
# Create D1 database
npm run wrangler d1 create tts-megasystem-db
# Copy database_id from output
# Create R2 bucket
npm run wrangler r2 bucket create tts-megasystem-assets
apps/api/wrangler.tomldatabase_id under [[d1_databases]]bucket_name under [[r2_buckets]]cd apps/api
# Use the shared schema
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
cd apps/api
cp .dev.vars.example .dev.vars # if provided
# Otherwise create .dev.vars and add keys from your environment
cd apps/web-svelte
cp .env.example .env # if provided
# Add:
# VITE_API_URL=http://localhost:8787
# VITE_EDITOR_URL=http://localhost:3001
# From repo root (starts API + React editor)
npm run dev:clean
# Start SvelteKit shell
cd apps/web-svelte
npm run dev
Expected ports:
/home after login๐ง Common Issues
Port already in use
lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9
Wrangler login fails
rm -rf ~/.wrangler
npm run wrangler login
D1 database not found
cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
๐ Next Steps
Setup Time: ~30 minutes โข Difficulty: Intermediate
ARCHITECTURE.md - TTS Megasystem Architecture
๐๏ธ Overview The platform uses a hybrid frontend (SvelteKit shell + React editor) with a Cloudflare Workers API. This keeps global browsing fast while preserving a rich editor experience.
๐ฏ Why Hybrid
๐ Architecture Diagram โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ USER EXPERIENCE โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ โ โ Shell (SvelteKit) โ โ /home /chat /settings /library โ โ โ โ โ Open project โ โ โ โ Editor (React) โ โ /editor/:projectId โ โ โ โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ API calls โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ CLOUDFLARE WORKERS API (Hono) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ /api/auth /api/projects /api/files โ โ /api/tts /api/voxdub /api/asr โ โ /api/preview /api/translate /api/jobs โ โโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ โ โ โ โผ โผ โผ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ D1 โ โ R2 โ โ Queues โ โ (SQL DB) โ โ (Assets) โ โ (Jobs) โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
๐ Core Request Flows
Project list (Shell) Shell โ /api/projects โ D1 โ JSON โ Shell renders
Open editor Shell redirects to /editor/:id โ React loads โ /api/projects/:id
TTS Editor โ /api/tts/generate โ Worker โ Provider โ R2 โ response
Video dub Editor โ /api/voxdub/create โ Queue โ Workers โ R2 โ preview
๐๏ธ Storage Strategy
๐ API Contract
The authoritative endpoint and schema reference lives in API_CONTRACT.md. Keep SvelteKit shell and React editor aligned with it to avoid drift.
๐ Auth Token-based auth via Workers API. Client stores token in localStorage and sends it as Bearer. (Can be upgraded to cookie-based later.)
๐ Routing & Apps
/home, /chat, /login, /signup, /settings./editor/:projectId (same domain in prod, via routing or redirect).VITE_EDITOR_URL in dev.๐งฉ Services (Local Dev)
apps/api (Workers) โ http://localhost:8787apps/web-svelte โ http://localhost:517xapps/web (React) โ http://localhost:3001๐ Data Model (Current)
projects table (D1):
Project metadata lives in D1. Heavy assets (audio/video/image) live in R2.
๐งต Background Jobs Queue workers handle:
youtube_dub, file_dub, url_dub (segment TTS + manifest)๐ Security Notes
Authorization: Bearer <token>.ALLOWED_ORIGINS.๐ Deployment
Architecture Version: 1.0 Last Updated: December 2025
Short, common commands for local development. See SETUP.md for full details.
npm install
# Start API + React editor
npm run dev:clean
# Start SvelteKit shell (separate terminal)
cd apps/web-svelte
npm run dev
cd apps/api
npm run dev
cd apps/web
npm run dev
cd apps/web-svelte
npm run dev
cd apps/api
npm run wrangler d1 execute tts-megasystem-db --local --file=../../packages/db/schema.sql
cd apps/api
npm run wrangler login
http://localhost:8787http://localhost:3001http://localhost:5175 (or next available)lsof -ti:3001 | xargs kill -9
lsof -ti:8787 | xargs kill -9
lsof -ti:5173 | xargs kill -9
docker-compose up -d --build extractor
docker-compose restart extractor
KEEPS SVELTEKIT SHELL + REACT EDITOR IN SYNC. NO DRIFT ALLOWED.
POST /api/chat-script
Content-Type: application/json
REQUEST:
{
"prompt": "Create 5-slide faceless video: AI productivity tips"
}
RESPONSE: 200
{
"slides": [
{
"id": "slide-1",
"title": "Hook",
"text": "Struggling with focus?",
"image_prompt": "distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9"
},
{
"id": "slide-2",
"title": "Solution 1",
"text": "Pomodoro: 25min work, 5min break",
"image_prompt": "clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
}
]
}
POST /api/generate-slides
REQUEST:
{
"slide_prompts": [
"distracted worker at messy desk, bold white text 'Productivity Crisis' overlay, 16:9",
"clean desk with timer, bold text 'POMODORO TECHNIQUE' overlay"
]
}
RESPONSE: 200
{
"images": [
"https://r2.dev/slide-1.jpg",
"https://r2.dev/slide-2.jpg"
]
}
POST /api/render-video
REQUEST:
{
"projectId": "proj_123",
"slides": [...],
"voice_sample_url": "https://r2.dev/voice.wav",
"image_urls": [...]
}
RESPONSE: 202
{
"render_id": "render_456",
"status": "queued",
"progress_url": "/api/render/456"
}
400: { "error": "Invalid prompt", "code": "INVALID_INPUT" }
401: { "error": "Unauthorized", "code": "UNAUTH" }
429: { "error": "Rate limited", "code": "RATE_LIMIT" }
interface Slide {
id: string;
title: string;
text: string;
image_prompt: string;
image_url?: string;
}
EXACT PROMPTS FOR CONSISTENT QUALITY. NO DEVIATIONS.
SYSTEM PROMPT:
You are a faceless YouTube script generator. ALWAYS return valid JSON.
Generate EXACTLY 5 slides for a 60-second video. Each slide: 8-12 seconds max.
Format: { "slides": [ { "id": "slide-1", "title": "Hook", "text": "Attention-grabbing question or stat (10 words max)", "image_prompt": "professional 16:9 slide description with EXACT TEXT overlay" } ] }
Rules:
USER PROMPT TEMPLATE:
Create 5-slide faceless video: ${user_input}
EXPECTED OUTPUT:
```json
{
"slides": [
{
"id": "slide-1",
"title": "Hook",
"text": "Struggling with focus?",
"image_prompt": "distracted worker desk, bold white text 'FOCUS CRISIS' overlay, 16:9"
}
]
}
PROMPT TEMPLATE:
"${slide_description}, professional presentation slide, EXACT TEXT '${slide.text}' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"
EXAMPLE:
"clean desk with pomodoro timer, professional presentation slide, EXACT TEXT '25min FOCUS' overlay, bold modern sans-serif font, high contrast white text on dark background, centered composition, 16:9 aspect ratio, clean minimalist design"
Sources
Enable production YouTube dubbing with real APIs in 15 minutes.
AIzaSy...)Cost: FREE (10,000 quota units/day)
X-RapidAPI-Key from code snippet (right side)Cost: FREE tier (500 req/month), then $0.01/request
Alternative APIs (if preferred):
sk-proj-...)Cost: $0.006 per minute of audio (10-min video = $0.06)
Note: If you already set up OpenAI for Phase 10C TTS, use the same key.
cd apps/api
# YouTube
echo "AIzaSy_YOUR_KEY_HERE" | wrangler secret put YOUTUBE_API_KEY
# RapidAPI
echo "YOUR_RAPIDAPI_KEY_HERE" | wrangler secret put RAPIDAPI_KEY
# OpenAI (skip if already set in Phase 10C)
echo "sk-proj-YOUR_KEY_HERE" | wrangler secret put OPENAI_API_KEY
Verify secrets:
wrangler secret list
npm run deploy
https://www.youtube.com/watch?v=dQw4w9WgXcQ# Get JWT token first (login via frontend and copy from localStorage)
export JWT_TOKEN="your_jwt_token_here"
# Create dubbing job
curl -X POST https://your-api.workers.dev/api/voxdub/create \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"targetLanguage": "Spanish"
}'
# Response:
# {
# "id": "job_1234567890_abc123",
# "status": "queued",
# "message": "YouTube dubbing job started",
# "metadata": {
# "title": "Never Gonna Give You Up",
# "duration": 212,
# "channelTitle": "Rick Astley",
# ...
# }
# }
# Check job status
curl https://your-api.workers.dev/api/jobs/job_1234567890_abc123 \
-H "Authorization: Bearer $JWT_TOKEN"
Watch real-time processing:
wrangler tail --format pretty
Look for:
[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[Whisper] Transcription complete: 42 segments
[Gemini] Translation complete
[TTS] Generated segment 1/42
...
[VoxDub] Job completed
User submits YouTube URL
โ
[1] YouTube Data API โ Get video metadata (title, duration)
โ
[2] RapidAPI โ Download audio โ Store in R2
โ
[3] OpenAI Whisper โ Transcribe โ 42 segments with timestamps
โ
[4] Gemini AI โ Translate each segment to target language
โ
[5] Multi-TTS Router โ Generate audio for each segment
โ
[6] Save manifest โ Job complete โ User downloads
Total Time: 2-5 minutes for 3-minute video
| Service | Cost |
|---|---|
| YouTube Data API | FREE |
| RapidAPI audio download | ~$0.01 |
| Whisper transcription | $0.06 |
| Gemini translation | FREE |
| TTS generation | 225-450 credits |
Total per video: ~$0.07 + credits
For testing UI/flows without costs:
cd apps/api
npm run dev
# Don't add .dev.vars file
# Services will use mock data
# Verify secrets exist
wrangler secret list
# Re-add if missing
wrangler secret put YOUTUBE_API_KEY
wrangler secret put RAPIDAPI_KEY
wrangler secret put OPENAI_API_KEY
# Re-deploy
npm run deploy
docs/YOUTUBE_INTEGRATION.md (300+ lines)docs/YOUTUBE_PRODUCTION_SUMMARY.md.github/copilot-instructions.mdwrangler secret (not wrangler.toml [vars]) for production.dev.vars (already in .gitignore)docs/YOUTUBE_INTEGRATION.md security section)Estimated Setup Time: 15 minutes
First Video Processing Time: 2-5 minutes
Cost for Testing (5 videos): ~$0.35 + credits
๐ฏ You're ready to go! Start with a short YouTube video to test the full pipeline.
The YouTube dubbing feature (/api/voxdub) now uses real production APIs for:
Purpose: Fetch video metadata (title, duration, channel info, thumbnails)
Setup Steps:
Cost: Free tier includes 10,000 quota units/day (each metadata request = 1 unit)
Add to wrangler.toml:
YOUTUBE_API_KEY = "AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
Purpose: Extract audio stream URLs from YouTube videos
Setup Steps:
Alternative APIs (if preferred):
Add to wrangler.toml:
RAPIDAPI_KEY = "1234567890abcdefXXXXXXXXXXXXXXXXX"
Note: If using a different RapidAPI endpoint, update the API URL in apps/api/src/services/youtube.ts:
const response = await fetch(
`https://YOUR-API-HOST.p.rapidapi.com/endpoint?url=...`,
{
headers: {
'X-RapidAPI-Key': this.rapidApiKey,
'X-RapidAPI-Host': 'YOUR-API-HOST.p.rapidapi.com'
}
}
);
Purpose: Transcribe YouTube audio to text with timestamps
Setup Steps:
Cost: $0.006 per minute of audio (e.g., 10-minute video = $0.06)
Add to wrangler.toml:
OPENAI_API_KEY = "sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
Note: This key is already configured for OpenAI TTS in Phase 10C. Same key works for Whisper.
User uploads YouTube URL
โ
[YouTube Data API] โ Get metadata (title, duration)
โ
Job queued โ Queue handler starts
โ
[RapidAPI] โ Extract audio โ Download to R2
โ
[OpenAI Whisper] โ Transcribe audio โ Segments with timestamps
โ
[Gemini AI] โ Translate segments to target language
โ
[Multi-TTS Router] โ Generate dubbed audio for each segment
โ
Save manifest to R2 โ Job completed
apps/api/src/services/youtube.ts)getVideoMetadata(videoId) - Uses YouTube Data API v3getAudioStreamUrl(videoId) - Uses RapidAPIdownloadAudioToR2(videoId, r2Bucket, tenantId, userId) - Downloads and stores audiotranscribe(audioUrl) - Sends audio to OpenAI Whisper APIformatForDubbing(segments) - Converts Whisper output to dubbing manifest formatapps/api/src/queue-handler.ts)processYoutubeDubJob() workflow:
apps/api/src/routes/voxdub.ts)POST /api/voxdub/create
If API keys are missing, services gracefully degrade:
curl "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id=dQw4w9WgXcQ&key=YOUR_KEY"
curl -X GET \
'https://youtube-mp3-downloader2.p.rapidapi.com/ytmp3/ytmp3/custom/?url=https://www.youtube.com/watch?v=dQw4w9WgXcQ&quality=320' \
-H 'X-RapidAPI-Key: YOUR_KEY' \
-H 'X-RapidAPI-Host: youtube-mp3-downloader2.p.rapidapi.com'
curl https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/audio.mp3" \
-F model="whisper-1"
// POST to your API
const response = await fetch('http://localhost:8787/api/voxdub/create', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_JWT',
'Content-Type': 'application/json'
},
body: JSON.stringify({
youtubeUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
targetLanguage: 'Spanish'
})
});
const job = await response.json();
console.log('Job ID:', job.id);
// Poll job status
const status = await fetch(`http://localhost:8787/api/jobs/${job.id}`, {
headers: { 'Authorization': 'Bearer YOUR_JWT' }
});
cd apps/api
# YouTube Data API
echo "YOUR_KEY" | wrangler secret put YOUTUBE_API_KEY
# RapidAPI
echo "YOUR_KEY" | wrangler secret put RAPIDAPI_KEY
# OpenAI (if not already set)
echo "YOUR_KEY" | wrangler secret put OPENAI_API_KEY
npm run deploy
wrangler secret list
YouTube Data API Errors:
403 Forbidden โ API key not enabled or quota exceeded404 Not Found โ Video doesn't exist or is private400 Bad Request โ Invalid video ID formatRapidAPI Errors:
403 Forbidden โ Invalid API key or subscription expired429 Too Many Requests โ Rate limit exceeded500 Server Error โ Video unavailable or service downWhisper API Errors:
401 Unauthorized โ Invalid API key400 Bad Request โ Audio file too large (max 25 MB)429 Rate Limit โ Exceeded usage limitsCheck logs in Cloudflare Dashboard:
wrangler tail --format pretty
Look for:
[YouTube] Fetching metadata for...[YouTube] Getting audio URL for...[YouTube] Audio stored at...[Whisper] Transcription failed: (if errors)Before resuming YouTube dubbing in environments that use the self-hosted yt-dlp API, verify:
docker ps (look for voxdub-audio-yt-dlp-api).cloudflared tunnel --url http://localhost:3034 (or your yt-dlp port).YTDLP_API_URL and YTDLP_SERVICE_URL in apps/api/wrangler.toml, then redeploy the API worker.Example: 10-minute YouTube video dubbed to Spanish
| Service | Usage | Cost |
|---|---|---|
| YouTube Data API | 1 metadata request | Free (within quota) |
| RapidAPI | 1 download request | $0.01 (varies by plan) |
| OpenAI Whisper | 10 minutes transcription | $0.06 |
| Multi-TTS | ~150 words ร 1.5 sec/word = 225 sec | 225-450 credits |
| Total API Cost | ~$0.07 + credits |
Restrict API Keys:
Use Cloudflare Secrets (not wrangler.toml [vars]):
wrangler secret put YOUTUBE_API_KEY
voxdub.ts:// Check user job count in last hour
const recentJobs = await c.env.DB.prepare(`
SELECT COUNT(*) as count FROM jobs
WHERE user_id = ? AND created_at > datetime('now', '-1 hour')
`).bind(user.sub).first();
if (recentJobs.count >= 10) {
return c.json({ error: 'Rate limit exceeded' }, 429);
}
if (metadata.duration > 3600) { // 1 hour max
return c.json({ error: 'Video too long (max 1 hour)' }, 400);
}
.github/copilot-instructions.md (already updated)apps/api/src/services/youtube.ts)New Features:
Key Methods:
// YouTubeService
constructor(config?: { youtubeApiKey?: string, rapidApiKey?: string })
getVideoMetadata(videoId) // Real YouTube Data API v3
getAudioStreamUrl(videoId) // Real RapidAPI downloader
downloadAudioToR2(videoId, r2Bucket, tenantId, userId) // Store audio in R2
// WhisperService (new class)
constructor(apiKey: string)
transcribe(audioUrl) // OpenAI Whisper with timestamps
formatForDubbing(segments) // Convert to dubbing manifest format
Metadata Response:
{
title: string;
duration: number; // in seconds
description: string;
channelTitle?: string;
thumbnailUrl?: string;
}
Whisper Transcription Response:
{
text: string; // Full transcript
segments: Array<{
start: number; // seconds
end: number; // seconds
text: string; // segment text
}>;
}
apps/api/src/queue-handler.ts)Changes in processYoutubeDubJob():
Before (mocked):
const mockTranscript = "Hello, welcome...";
await youtube.getAudioStreamUrl(videoId); // Dummy validation
After (production):
// 1. Download real audio to R2
const audioR2Key = await youtube.downloadAudioToR2(videoId, env.UPLOADS, tenantId, userId);
// 2. Transcribe with Whisper
const audioUrl = await youtube.getAudioStreamUrl(videoId);
const { text, segments } = await whisper.transcribe(audioUrl);
// 3. Format segments with timestamps
const dubbingSegments = whisper.formatForDubbing(segments);
// 4. Translate with Gemini
const manifest = await gemini.generateStructuredDubbingScript(text, targetLanguage);
// 5. Merge Whisper timestamps with Gemini translations
manifest = manifest.map((seg, idx) => ({
...seg,
start: segments[idx]?.start || seg.start,
end: segments[idx]?.end || seg.end
}));
Progress Tracking:
apps/api/src/routes/voxdub.ts)Changes:
YOUTUBE_API_KEY and RAPIDAPI_KEY to Bindings typeYouTubeService with config from environmentBefore:
const ytService = new YouTubeService();
try {
metadata = await ytService.getVideoMetadata(videoId);
} catch (e) {
console.warn("Failed, continuing anyway...", e);
}
After:
const ytService = new YouTubeService({
youtubeApiKey: c.env.YOUTUBE_API_KEY,
rapidApiKey: c.env.RAPIDAPI_KEY
});
try {
metadata = await ytService.getVideoMetadata(videoId);
} catch (e: any) {
return c.json({
error: 'Unable to access YouTube video. Check if video is public and API keys are configured.',
details: e.message
}, 400);
}
apps/api/wrangler.toml)Added Environment Variables:
# YouTube Integration (Production)
YOUTUBE_API_KEY = "" # YouTube Data API v3 key from Google Cloud Console
RAPIDAPI_KEY = "" # RapidAPI key for YouTube audio extraction
Note: OPENAI_API_KEY already existed from Phase 10C (used for both TTS and Whisper)
docs/YOUTUBE_INTEGRATION.md)New 300+ line guide covering:
.github/copilot-instructions.md)Added Section:
# Get key from: https://console.cloud.google.com/
wrangler secret put YOUTUBE_API_KEY
# Enter: AIzaSyXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Get key from: https://rapidapi.com/
wrangler secret put RAPIDAPI_KEY
# Enter: 1234567890abcdefXXXXXXXXXXXX
# Get key from: https://platform.openai.com/
wrangler secret put OPENAI_API_KEY
# Enter: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXX
cd apps/api
npm run dev
# Create .dev.vars file
cat > .dev.vars << EOF
YOUTUBE_API_KEY=AIzaSy...
RAPIDAPI_KEY=123456...
OPENAI_API_KEY=sk-proj...
EOF
npm run dev
curl -X POST http://localhost:8787/api/voxdub/create \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{
"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"targetLanguage": "Spanish"
}'
# Get job ID from previous response
curl http://localhost:8787/api/jobs/job_XXXXX \
-H "Authorization: Bearer YOUR_JWT"
| Feature | Before (Mock) | After (Production) |
|---|---|---|
| Video Metadata | Hardcoded title/duration | Real YouTube Data API v3 |
| Audio Extraction | Dummy WAV file URL | RapidAPI audio download + R2 storage |
| Transcription | "Hello, welcome to this video..." |
OpenAI Whisper with real timestamps |
| Error Handling | Warnings, continues anyway | Fails early with detailed errors |
| API Keys | None required | 3 keys required (graceful fallback) |
| Timestamp Accuracy | Generic 5-second segments | Real segment boundaries from Whisper |
| Speaker Detection | Single "Host" speaker | Automatic grouping by ~10 segments |
| Cost | $0 | ~$0.07 per 10-min video + TTS credits |
apps/api/src/services/youtube.ts (200+ lines added)
apps/api/src/queue-handler.ts (60 lines modified)
apps/api/src/routes/voxdub.ts (10 lines modified)
apps/api/wrangler.toml (3 lines added)
docs/YOUTUBE_INTEGRATION.md (new file, 300+ lines)
.github/copilot-instructions.md (30 lines added)
| Service | Usage | Cost |
|---|---|---|
| YouTube Data API | 1 metadata request | Free (10k/day quota) |
| RapidAPI | 1 audio download | ~$0.01 |
| OpenAI Whisper | 10 minutes | $0.06 |
| Gemini Translation | ~500 words | Free (under quota) |
| TTS Generation | ~225 seconds | 225-450 credits* |
Total External Cost: ~$0.07 per video Total Credits: 225-450 (depending on quality tier)
* Credits: Premium=2/sec, Fast=1/sec, Cheap=0.5/sec
wrangler secret put YOUTUBE_API_KEY
wrangler secret put RAPIDAPI_KEY
wrangler secret put OPENAI_API_KEY
cd apps/api && npm run deploywrangler tail --format pretty.dev.vars file exists for local devwrangler secret listWhen working correctly, you should see logs like:
[YouTube] Downloading audio for dQw4w9WgXcQ
[YouTube] Audio stored at youtube/tenant_123/user_456/dQw4w9WgXcQ.mp3
[YouTube] Transcribing audio
[Whisper] Transcription complete: 42 segments
[YouTube] Translating to Spanish
[YouTube] Generating dubbed audio
[VoxDub] Video metadata: { title: "Never Gonna Give You Up", duration: 212, ... }
No errors, no warnings, real data flowing through pipeline.
The YouTube dubbing feature is now production-ready with:
Status: โ Implementation Complete | โณ API Key Setup Required