Speech to Text Audio transcription Upload or record TXT download No signup

Free Speech to Text Online

Upload an audio file or record speech, then convert it into editable text with browser-based AI transcription. Copy the transcript or download TXT and SRT files with no signup.

1Load Model

2Provide Audio

3Transcribe

Model

Language

Timestamps

Audio SourceMP3, WAV, M4A, WebM, OGG

Click to upload an audio fileMax 100 MB

Transcript

Quick facts

Speech to Text at a Glance

Price: Free
Signup: Not required
Input: Audio upload or browser microphone recording
Supported formats: MP3, WAV, M4A, WebM, OGG
File size: Up to 100 MB
Best length: Under 15 minutes
Languages: Auto-detect or manual language selection
Processing: Runs locally in supported browsers with WebAssembly
Output: Reviewable transcript, copy, TXT download, SRT export with timestamps
First use: AI model download required
Recommended use: Notes, captions, interviews, meetings, podcasts

Workflow

How Audio Becomes Text

Steps

From audio to downloadable transcript

Five steps to transcribe audio into text in your browser.

View steps

Upload audio or record speech: Drag and drop an audio file or use your microphone to record.
Choose language or auto-detect: Select the spoken language or let the model detect it automatically.
Start transcription: The AI model processes your audio locally and generates a text transcript.
Review transcript: Read and verify the generated transcript in the browser.
Copy or download text: Copy the transcript, download as .txt, or export as .srt with timestamps.

Audio processing runs in your browser after the model is loaded. No audio is uploaded to a server.

Input

Upload, Record, and Transcribe

Two ways to provide your audio for transcription.

View details

Upload: Drag and drop or click to upload MP3, WAV, M4A, WebM, or OGG files up to 100 MB.
Record: Use your microphone to record speech directly. The browser captures audio and processes it locally.

For best results, use clear audio with minimal background noise. Files under 15 minutes produce the fastest results.

Privacy

Browser Transcription Privacy

Your audio stays in your browser during transcription.

View details

Your audio file is decoded and transcribed in the browser.
The AI model is cached locally after the first download.
No signup, no account, no server-side processing of your audio.
Transcripts are generated entirely on your device.

Note: The AI model is downloaded from the internet on first use. After caching, the model loads from your browser storage.

Use Cases

Best Uses for Speech to Text

Great for

•Meeting notes and discussion transcripts
•Podcast drafts from recordings
•Interview transcripts for research
•Video captions and subtitles
•Voice memo cleanup into text

Limitations

•Best results under 15 minutes of audio
•Cannot distinguish multiple speakers (no diarization)
•Accuracy varies by language and audio quality

Comparison

Speech to Text vs Text to Speech

Workflow	Input	Output	Best For
Speech to Text	Audio or microphone recording	Editable transcript	Notes, captions, interviews
Text to Speech	Written text	Voice audio	Voiceovers, narration, localization

TTSBox offers both tools. Generate voice from text with optional voice cloning.

FAQ

Frequently asked questions

What is speech to text?

Speech to text (STT) is technology that converts spoken audio into written text. TTSBox uses AI models running in your browser to transcribe audio files or microphone recordings into editable text. Transcription runs locally after the model is loaded.

What audio formats are supported?

TTSBox supports MP3, WAV, M4A, WebM, and OGG audio files up to 100 MB. For best results, use clear audio with minimal background noise. Audio longer than 15 minutes should be split into smaller segments for reliable results.

Can I record audio in the browser?

Yes. You can record speech directly using your microphone within TTSBox. The browser captures your audio and processes it locally for transcription. No additional software, plugins, or account signup is needed to start recording and transcribing.

Does TTSBox upload my audio?

No. Your audio is decoded and transcribed entirely in your browser. The AI model is downloaded on first use and cached locally. After the initial model download, transcription runs without uploading your audio to any server.

Can I download the transcript?

Yes. You can copy the transcript to your clipboard, download it as a plain text file (.txt), or export it as a SubRip subtitle file (.srt) with timestamps. All export options work directly in the browser without signup.

Can I transcribe long audio files?

TTSBox works best with audio under 15 minutes. Longer files can be processed but may take significant time depending on your hardware. For very long recordings, split them into smaller segments for faster and more reliable transcription results.

What is the difference between speech to text and text to speech?

Speech to text converts audio into written text. Text to speech does the opposite: it converts written text into spoken audio. TTSBox offers both as separate tools. Text to speech also includes optional voice cloning for custom voice sources.

Need to generate audio from text? Try Text to Speech.