YouTube Video Automation

An end-to-end pipeline that automatically converts kids' news articles into narrated YouTube videos with AI-generated tags and a human review workflow.

FastAPIReactTypeScriptTailwind CSSSQLiteMoviePyPillowElevenLabsYouTube Data APIOllamaAPScheduler

GitHub Visit YouTube Channel

March 2026

Overview

YouTube Video Automation is a full-stack platform that turns kids' news articles into ready-to-publish YouTube videos — completely hands-off. It fetches articles from a news API, generates narration with ElevenLabs TTS, composes videos with full-screen background images and timed text overlays, generates tags with a local AI model, and uploads to YouTube with thumbnails and playlist support.

The motivation was simple: producing educational video content for kids is repetitive and time-consuming. This system automates the entire pipeline while keeping a human-in-the-loop review step so nothing goes live without approval. It runs on a daily schedule but can also be triggered manually from a web dashboard.

Features

Automated content pipeline — Fetches articles, generates TTS audio, composes video, generates tags, and uploads to YouTube on a configurable daily schedule
Full-screen video composition — Article image scaled to cover 1920x1080, title at top with text shadow, paragraphs shown one at a time at the bottom with a semi-transparent dark backdrop
Human review workflow — Preview, approve, reject, or regenerate videos before they go to YouTube
AI-powered tag generation — Local Ollama LLM generates topic-specific tags, merged with default tags, respecting YouTube's 500-character limit
YouTube integration — OAuth 2.0 auth, automatic upload with metadata, custom thumbnails from article images, auto-add to playlist, and deletion support
Background video regeneration — Non-blocking regeneration with real-time status polling and spinner UI
Bulk operations — Dropdown menu for bulk deleting generated, failed, or rejected videos
Retry failed uploads — One-click retry for videos that failed during upload
Configurable settings — Daily article limit, age group targeting (3-6 or 7-10), scheduler timing, ElevenLabs voice selection

Architecture

The system follows a service-oriented architecture with a FastAPI backend orchestrating multiple specialized services:

Articles API → Article Fetcher → TTS Service (ElevenLabs) → Video Composer (MoviePy + PIL)
    → Tag Generator (Ollama) → YouTube Uploader (Google API) → Playlist

Backend (Python/FastAPI): Central REST API handling all business logic. Uses aiosqlite for async database operations and APScheduler for daily cron jobs. Each pipeline stage is a separate service module with its own error handling and status tracking.

Video Composition: PIL renders text to images (for precise control over alignment and shadows), which MoviePy composites over the full-screen background. Each paragraph is timed proportionally to its character count relative to the audio duration, shown only during its time slot.

Status Machine: Videos flow through 8 states — pending_tts → pending_video → generated → approved/rejected → uploading → uploaded/failed. Each state is independently recoverable, so a failure at any stage doesn't lose prior work.

Frontend (React + Vite): Single-page app with Axios API client, Tailwind CSS styling, and polling-based real-time updates for background operations.

Learnings

Understanding the YouTube Data API — working with quota limits, OAuth token lifecycle, and undocumented upload restrictions required careful handling of auth refresh, retry logic, and user friendly error message handling.
TTS cost management — ElevenLabs charges per character, so generated audio is saved separately from the video. This way, adjusting the video layout only requires recomposing the video without regenerating (and repaying for) the audio.
Video composition is an iterative visual process — getting the layout right (image scaling, text positioning, font sizes, overlay opacity) required constant regeneration and review. Building a one-click regenerate button that works in the background was essential — without it, every layout tweak meant waiting for the full compose cycle.
The balance between automation and manual control is the real design challenge — full automation sounds ideal but produces mistakes that are expensive to fix once they're on YouTube. The sweet spot is automating the repetitive work (fetching, TTS, composing, tagging) while keeping humans in the loop for quality decisions (approve/reject, tag editing, review before upload). This means every automated step needs a manual override: regenerate the video, edit the tags, retry the upload, delete from YouTube.
Automated workflows need manual escape hatches at every stage — the pipeline runs daily on a schedule, but things go wrong: bad articles, weird TTS output, quota limits, auth expiry. The status-based pipeline design means each video's state is tracked independently, so a failure at one stage doesn't block others or lose prior work. Every action is reversible — you can regenerate without re-doing TTS, retry uploads without regenerating, delete from YouTube and re-upload. This saves time in the normal case while making it easy to correct mistakes when they happen.
AI-generated content needs human curation, not replacement — Ollama generates useful tags most of the time, but occasionally produces irrelevant or overly generic suggestions. AI-generated tags are merged with a fixed set of predefined tags to ensure full coverage. The editable tag UI lets you review and adjust before publishing.

Overview

Features

Architecture

Learnings

Demo