TTSbox
Speech to Text Audio transcription Upload or record TXT download No signup

Free Speech to Text Online

Upload an audio file or record speech, then convert it into editable text with browser-based AI transcription. Copy the transcript or download TXT and SRT files with no signup.

1
2
3

Quick facts

Speech to Text at a Glance

Price
Free
Signup
Not required
Input
Audio upload or browser microphone recording
Supported formats
MP3, WAV, M4A, WebM, OGG
File size
Up to 100 MB
Best length
Under 15 minutes
Languages
Auto-detect or manual language selection
Processing
Runs locally in supported browsers with WebAssembly
Output
Reviewable transcript, copy, TXT download, SRT export with timestamps
First use
AI model download required
Recommended use
Notes, captions, interviews, meetings, podcasts

Workflow

How Audio Becomes Text

Steps

From audio to downloadable transcript

Five steps to transcribe audio into text in your browser.

View steps
  1. Upload audio or record speech: Drag and drop an audio file or use your microphone to record.
  2. Choose language or auto-detect: Select the spoken language or let the model detect it automatically.
  3. Start transcription: The AI model processes your audio locally and generates a text transcript.
  4. Review transcript: Read and verify the generated transcript in the browser.
  5. Copy or download text: Copy the transcript, download as .txt, or export as .srt with timestamps.

Audio processing runs in your browser after the model is loaded. No audio is uploaded to a server.

Input

Upload, Record, and Transcribe

Two ways to provide your audio for transcription.

View details
  • Upload: Drag and drop or click to upload MP3, WAV, M4A, WebM, or OGG files up to 100 MB.
  • Record: Use your microphone to record speech directly. The browser captures audio and processes it locally.

For best results, use clear audio with minimal background noise. Files under 15 minutes produce the fastest results.

Privacy

Browser Transcription Privacy

Your audio stays in your browser during transcription.

View details
  • Your audio file is decoded and transcribed in the browser.
  • The AI model is cached locally after the first download.
  • No signup, no account, no server-side processing of your audio.
  • Transcripts are generated entirely on your device.

Note: The AI model is downloaded from the internet on first use. After caching, the model loads from your browser storage.

Use Cases

Best Uses for Speech to Text

Great for

  • Meeting notes and discussion transcripts
  • Podcast drafts from recordings
  • Interview transcripts for research
  • Video captions and subtitles
  • Voice memo cleanup into text

Limitations

  • Best results under 15 minutes of audio
  • Cannot distinguish multiple speakers (no diarization)
  • Accuracy varies by language and audio quality

Comparison

Speech to Text vs Text to Speech

Workflow Input Output Best For
Speech to Text Audio or microphone recording Editable transcript Notes, captions, interviews
Text to Speech Written text Voice audio Voiceovers, narration, localization

TTSBox offers both tools. Generate voice from text with optional voice cloning.

FAQ

Frequently asked questions

What is speech to text?
Speech to text (STT) is technology that converts spoken audio into written text. TTSBox uses AI models running in your browser to transcribe audio files or microphone recordings into editable text. Transcription runs locally after the model is loaded.
What audio formats are supported?
TTSBox supports MP3, WAV, M4A, WebM, and OGG audio files up to 100 MB. For best results, use clear audio with minimal background noise. Audio longer than 15 minutes should be split into smaller segments for reliable results.
Can I record audio in the browser?
Yes. You can record speech directly using your microphone within TTSBox. The browser captures your audio and processes it locally for transcription. No additional software, plugins, or account signup is needed to start recording and transcribing.
Does TTSBox upload my audio?
No. Your audio is decoded and transcribed entirely in your browser. The AI model is downloaded on first use and cached locally. After the initial model download, transcription runs without uploading your audio to any server.
Can I download the transcript?
Yes. You can copy the transcript to your clipboard, download it as a plain text file (.txt), or export it as a SubRip subtitle file (.srt) with timestamps. All export options work directly in the browser without signup.
Can I transcribe long audio files?
TTSBox works best with audio under 15 minutes. Longer files can be processed but may take significant time depending on your hardware. For very long recordings, split them into smaller segments for faster and more reliable transcription results.
What is the difference between speech to text and text to speech?
Speech to text converts audio into written text. Text to speech does the opposite: it converts written text into spoken audio. TTSBox offers both as separate tools. Text to speech also includes optional voice cloning for custom voice sources.

Related

Related Transcription Tools

Need to generate audio from text? Try Text to Speech.