TTS Compare

Personal Project

2026

TTS Compare
TTS Compare
TTS Compare
Model comparison matrix

Vertical

AI/Machine Learning

Geography

Global

Media Type(s)

CLI Tool

Tags

PythonTTSTextualBenchmarkingGenerative AIOpen Source
A side-by-side comparison framework for eight open-source text-to-speech models, with an interactive Textual TUI for evaluating quality, latency, and feature coverage.

Credits

Overview

TTS Compare lets you generate the same input across multiple open-source text-to-speech models in a single run, then compare quality, latency, and per-model feature support side-by-side. The project addresses a recurring practical question, "which TTS should we ship?", by replacing ad-hoc one-off testing with a reproducible harness.

Models covered

The harness currently runs eight models spanning the modern open-source TTS landscape:

ModelParamsNotable feature
Maya13B17 emotion tags, voice description
Kokoro-82M82M11 voice presets, very small
Chatterbox500M5 emotion tags, reference-audio cloning
Orpheus-3B3BvLLM backend (Linux only)
Qwen3-TTS1.7BVoice description
Fish Speech 1.5500MReference-audio cloning
CosyVoice 2500MVoice description + reference audio
XTTS v2500MReference-audio cloning

Architecture decisions

  • Per-model isolated .venv. Each model gets its own environment under models/<name>/ to avoid CUDA/torch version skew between systems with conflicting requirements.
  • JSON-over-stdin worker protocol. Each model is invoked as a subprocess that accepts a JSON request on stdin and writes a WAV file. Decoupling the harness from the model runtimes keeps the Textual TUI snappy and lets a model crash without taking the rest of the run down.
  • Hardware adaptive. Detects CUDA, MPS, or CPU and configures models accordingly.
  • Three-screen TUI. Input → model selection → execution, with real-time logs streaming as each model generates speech.

References