Transcribr.net
PricingAPISign inGet started
30 min free · No card · Cancel anytime

Audio in.
Transcript out. That's it.

Drop a podcast, interview or meeting. We diarize the speakers, time-align every word, and hand it back as a clean transcript with chapters and AI show-notes ready to ship — in minutes, not hours.

  • 94% accurate with proper speaker labels, even in crosstalk
  • Faster than realtime— a 60-min file finishes in ~18 min
  • One-click show notes, exports in SRT · VTT · DOCX · JSON
  • Same per-minute pricing on the API. No seats, no lock-in.

Drop your files here

Audio or video, up to 2 GB each. Drop one or many — anywhere on the page.

or try a sample:
0.3× realtime147+ languages2 GB max
Drop a file anywhere — or pick one
Audio or video, up to 2 GB · first 30 min free
Trusted by teams at
Linear
Substack
Vercel
Replit
Atavist
Loom
Pitch
What you get

Transcripts that actually read like transcripts.

Words, timing, speakers, and structure — accurate enough to publish, fast enough that the file finishes before you make coffee.

94% accurate, by default

Our default model handles accents, technical vocabulary, and crosstalk without tuning a thing. Switch to the Premium model from the menu when you need extra precision.

Speakers, not guesses

Real diarization — clusters voices, labels them, and lets you rename a speaker once and have every line update everywhere instantly.

Faster than realtime

A 60-minute episode finishes in ~18 minutes. No queue dance. We allocate an inference slot the moment you drop the file.

Search across everything

Word-level search across every transcript you've ever uploaded. Jump to the moment a phrase was said in two clicks.

Exports that fit your workflow

SRT, VTT, DOCX, plain text, JSON with word-level timestamps. Pick a preset or roll your own template.

Private by default

Files processed in your region. Audio deleted within 7 days. We never train on your data — there's nothing in the ToS to read between.

How it works

Four steps. The first one is the only one you do.

1

Drop the file

Audio or video, any common format, up to 2 GB. Or paste a URL — Zoom, Google Drive, Dropbox.

2

We decode & segment

Voice-activity detection finds speech. Change-point detection slices it into single-speaker chunks.

3

Recognize & diarize

Our model transcribes; speaker embeddings cluster the segments back together so each line gets a label.

4

Edit, share, export

Rename speakers, fix the rare misheard word, and download in your format. Or share a private link.

It replaced two tools, a Google Doc, and the meeting where we argued about who said what. The diarization is uncanny.

RG
Rachel Gomez
Senior Producer, Field Notes
Pricing

Pay for minutes you transcribe. Nothing else.

No seats, no per-export fees, no surprise overage. Stop paying when you stop uploading.

Hobby
Free

First 30 minutes — once, on us. No card needed.

  • 30 min lifetime grant
  • Diarization, exports, sharing
  • 1 user
Enterprise
Custom

For teams transcribing over 1,000 hours/month.

  • Volume pricing
  • SSO, SCIM, audit log
  • Dedicated tenant, SLA
  • Custom data residency
FAQ

The honest answers.

Can’t find it? Email us — a real person replies within a day.

What does “94% accurate” actually mean?
Word error rate of ~6% measured against our internal benchmark of 1,200 hours of mixed audio: podcasts, meetings, interviews, lectures, accented English, and crosstalk. That's roughly one mistake every 17 words. For clean studio audio, we typically see <3% WER.
How is diarization different from speech-to-text?
Speech-to-text answers what was said. Diarization answers who said it — independently. We run both pipelines and stitch the output together, which is why our speaker labels are stable across long files where most tools start drifting after 20 minutes.
Do you train models on my data?
No. Your audio is processed in your region, transcripts are encrypted at rest, and audio is deleted within 7 days unless you opt to keep it. Our models are trained on licensed and consented datasets only.
Can I edit a transcript after it's done?
Yes — inline. Click any word to fix it, drag to merge or split utterances, rename a speaker once and every reference updates everywhere. Exports always reflect the latest edit; there's no “regenerate” step.
Which file formats do you accept?
Audio: MP3, WAV, M4A, FLAC, OGG, AAC. Video: MP4, MOV, MKV, WebM (we strip the audio track). Up to 2 GB per file. Or paste a URL from Zoom, Google Drive, Dropbox, S3, or any direct link.
Is there an API?
Yes — REST and a JSON streaming endpoint. Same pricing as the dashboard, 100 req/min on Pro, custom limits on Enterprise. Word-level timestamps and confidence scores included by default.

Ready when you are.

Drop a file at the top of this page. We'll handle the rest.

No credit card. First 30 minutes free.
Drop to transcribe
Release anywhere on the page