Microphone recording

Turn Your Voice Into Text — And Text Into Voice

WhisperStream is your complete audio solution, featuring both speech-to-text transcription and text-to-speech synthesis using Chatterbox voice mimic technology.

WhisperStream is a download available to all Story Runner members.

Join today!


Become a Story Runner
screenshot
screenshot
screenshot

Why WhisperStream?

Speech to Text

Speech-to-Text

Convert spoken audio into accurate text transcriptions using advanced Whisper AI models. Support for single files and batch processing.

Text to Speech

Text-to-Speech

Transform text into natural-sounding speech using Chatterbox voice mimic technology. Create custom voices and process text in batches.

Batch

Batch Processing

Process dozens of audio files for transcription or text files for speech synthesis at once. Monitor progress and export all results efficiently.

Model

Model Manager

Choose the right Whisper models for transcription and Chatterbox voices for synthesis. Download, update, or remove models with a single click.

Diagnostics

Built-in Diagnostics

Trouble installing? Run diagnostics to check Python, Whisper, Chatterbox, FFmpeg, and network status. One-click fixes for common issues.

Privacy

Private & Local

All processing happens locally on your PC. No cloud processing, no additional AI charges, complete privacy for your audio and text.

How It Works

1. Choose Your Model

Select from multiple Whisper AI models for transcription or Chatterbox voices for speech synthesis. Download new models and voices as needed.

2. Add Your Files

For transcription: Add audio files. For synthesis: Add text files. Process single files or add a whole batch. Preview audio with the built-in media player.

3. Process & Review

Start transcription or synthesis. Watch progress in real time and review results as they complete.

4. Export & Use

Save transcriptions as text or export synthesized speech as audio files. Batch export supported for both workflows.

Ready to transform speech to text and text to speech?

Frequently Asked Questions

Does WhisperStream need the internet?

No. All speech recognition and synthesis runs locally on your PC. Internet is only required for downloading models, voices, or updates, and is never used for processing your audio or text.

Can I process multiple files at once?

Yes! WhisperStream's batch mode lets you add and process dozens of audio files for transcription or text files for speech synthesis in one go, with progress tracking and easy export of all results.

What is Chatterbox TTS?

Chatterbox is advanced voice mimic technology that can create natural-sounding speech from text. It can clone voices and generate speech that sounds remarkably human-like.

Is my data private?

Absolutely. All transcription and synthesis is done locally on your PC. No audio or text data is ever sent to external servers.

What if something doesn't work?

Use the built-in diagnostics to check your setup and fix common issues with Python, Whisper, Chatterbox, or FFmpeg. StreamTeem Discord community is also here to help.

Can I create custom voices?

Yes! With Chatterbox TTS, you can train custom voice models or use pre-trained voices to generate speech in different styles and accents.