VoiceLab

A local-first web app for experimenting with ElevenLabs text-to-speech and sound effects generation, with persistent history and audio playback.

ReactTypeScriptViteFastAPIPythonElevenLabs APIWeb Audio API
GitHub

February 2026

Overview

VoiceLab is a developer-facing tool for rapidly experimenting with ElevenLabs' audio generation capabilities. It pairs a FastAPI backend with a React + TypeScript frontend to let you generate speech and sound effects from text, audition different voices, and keep a persistent history of every experiment — all running locally. It also allows user to download the generated sound as mp3.

Features

  • Text-to-Speech generation with access to several ElevenLabs voices
  • Sound effects generation from freeform text descriptions with optional duration control
  • Voice browser to preview and select from the full ElevenLabs voice catalog
  • Persistent experiment history stored in localStorage with rename, delete, and batch-download
  • Web Audio API playback for reliable, low-latency audio preview
  • MP3 download for individual clips or the entire history
  • "Try Another Voice" — instantly re-run a TTS prompt with a different voice

Architecture

The app follows a clean client/server split:

  • Backend (FastAPI + Uvicorn): Exposes four endpoints — health check, voice listing, TTS generation, and SFX generation. Pydantic models enforce request validation. Audio is returned as MP3.
  • Frontend (React 19 + Vite): A tabbed UI with dedicated panels for TTS and SFX.
  • Storage: Generated audio is immediately converted from blobs to base64 data URLs so it can be serialized into localStorage alongside metadata (voice, prompt text, timestamps).

Learnings

  • Understanding the ElevenLabs API: Working with voice selection, model variants, and audio output formats. Learned how to integrate TTS and sound effects endpoints, handle streaming audio responses, and manage API usage effectively.
  • TTS is an API call, but the UX around it isn't: The actual speech generation is straightforward — send text, get audio back. The real work is in building a smooth experience around it: letting users quickly switch voices, replay clips, and keep a history of experiments without friction.
  • Client-side persistence trade-offs: Storing generated audio in localStorage is a simple way to keep data between page refreshes without a database, but it fills up fast. Fine for a local tool, not for production.

Demo