Transcribr.net

30 min free · No card · Cancel anytime

Audio in.
Transcript out. That's it.

Drop a podcast, interview or meeting. We diarize the speakers, time-align every word, and hand it back as a clean transcript with chapters and AI show-notes ready to ship — in minutes, not hours.

94% accurate with proper speaker labels, even in crosstalk
Faster than realtime— a 60-min file finishes in ~18 min
One-click show notes, exports in SRT · VTT · DOCX · JSON
Same per-minute pricing on the API. No seats, no lock-in.

See use cases See pricing

Drop your files here

Audio or video, up to 2 GB each. Drop one or many — anywhere on the page.

Choose a file

or try a sample:

0.3× realtime147+ languages2 GB max

Drop a file anywhere — or pick one

Audio or video, up to 2 GB · first 30 min free

Choose file

Trusted by teams at

Linear

Substack

Vercel

Replit

Atavist

Loom

Pitch

What you get

Transcripts that actually read like transcripts.

Words, timing, speakers, and structure — accurate enough to publish, fast enough that the file finishes before you make coffee.

94% accurate, by default

Our default model handles accents, technical vocabulary, and crosstalk without tuning a thing. Switch to the Premium model from the menu when you need extra precision.

Speakers, not guesses

Real diarization — clusters voices, labels them, and lets you rename a speaker once and have every line update everywhere instantly.

Faster than realtime

A 60-minute episode finishes in ~18 minutes. No queue dance. We allocate an inference slot the moment you drop the file.

Search across everything

Word-level search across every transcript you've ever uploaded. Jump to the moment a phrase was said in two clicks.

Exports that fit your workflow

SRT, VTT, DOCX, plain text, JSON with word-level timestamps. Pick a preset or roll your own template.

Private by default

Files processed in your region. Audio deleted within 7 days. We never train on your data — there's nothing in the ToS to read between.

How it works

Four steps. The first one is the only one you do.

Drop the file

Audio or video, any common format, up to 2 GB. Or paste a URL — Zoom, Google Drive, Dropbox.

We decode & segment

Voice-activity detection finds speech. Change-point detection slices it into single-speaker chunks.

Recognize & diarize

Our model transcribes; speaker embeddings cluster the segments back together so each line gets a label.

Edit, share, export

Rename speakers, fix the rare misheard word, and download in your format. Or share a private link.

Built for

Whoever has the recording, has the work.

A few of the people we built this for. There are more.

For Podcasters

Show notes in 30 seconds, not 30 minutes

Generate episode summaries, timestamped chapters, social pull-quotes and YouTube chapter markers from any podcast episode in one click.

For Anyone with a library of recordings

Chat with every transcript you've ever recorded

Search across every interview, episode and meeting in plain English. "What did Sarah say about pricing in the March calls?" — answered in seconds with timestamps.

For Podcasters & content creators

Auto-detect the most clip-worthy moments

AI surfaces 30–60 second highlight moments from every episode, with burnt-in subtitles. Export 9:16 / 1:1 / 16:9 MP4s ready for social.

For Researchers, UX teams, academics

Code interviews in our tool, export to yours

Highlight passages, attach codes from a workspace-level codebook, then export to NVivo, Atlas.ti, Dovetail or MAXQDA in their native formats — speaker labels intact.

For Anyone with a phone

Forward a voice memo, get a transcript back

A unique private inbox per account. Forward audio attachments to it from anywhere, get the transcript link in your reply within minutes.

For Podcast networks & shows

Every new episode transcribed automatically

Paste a podcast feed URL once. Every new episode auto-transcribes — and if you want, generates show notes too. Set a monthly budget cap.

See all six use cases →

Vs the alternatives

Already using something? Here's how we compare.

Honest side-by-sides — including where the competitor still wins.

vs Otter.ai

An Otter.ai alternative without the per-seat tax

Teams burned by per-seat pricing

vs Rev.com

A Rev alternative that's 3× cheaper for AI transcription

Teams paying for transcription accuracy

vs Descript

A Descript alternative if you just want the transcript

Podcasters who don't want a full editor

vs Castmagic

A Castmagic alternative without the second subscription

Podcasters paying for show-notes generation

All comparisons →

It replaced two tools, a Google Doc, and the meeting where we argued about who said what. The diarization is uncanny.

Rachel Gomez

Senior Producer, Field Notes

Pricing

Pay for minutes you transcribe. Nothing else.

No seats, no per-export fees, no surprise overage. Stop paying when you stop uploading.

Hobby

Free

First 30 minutes — once, on us. No card needed.

30 min lifetime grant
Diarization, exports, sharing
1 user

Pro · most teams pick this

$0.08/min

Pay-as-you-go. No commitment, no minimum.

Everything in Hobby
Unlimited minutes & files
Workspace with up to 10 seats
API access · 100 req/min

Enterprise

Custom

For teams transcribing over 1,000 hours/month.

Volume pricing
SSO, SCIM, audit log
Dedicated tenant, SLA
Custom data residency

FAQ

The honest answers.

Can’t find it? Email us — a real person replies within a day.

What does “94% accurate” actually mean?

Word error rate of ~6% measured against our internal benchmark of 1,200 hours of mixed audio: podcasts, meetings, interviews, lectures, accented English, and crosstalk. That's roughly one mistake every 17 words. For clean studio audio, we typically see <3% WER.

How is diarization different from speech-to-text?

Speech-to-text answers what was said. Diarization answers who said it — independently. We run both pipelines and stitch the output together, which is why our speaker labels are stable across long files where most tools start drifting after 20 minutes.

Do you train models on my data?

No. Your audio is processed in your region, transcripts are encrypted at rest, and audio is deleted within 7 days unless you opt to keep it. Our models are trained on licensed and consented datasets only.

Can I edit a transcript after it's done?

Yes — inline. Click any word to fix it, drag to merge or split utterances, rename a speaker once and every reference updates everywhere. Exports always reflect the latest edit; there's no “regenerate” step.

Which file formats do you accept?

Audio: MP3, WAV, M4A, FLAC, OGG, AAC. Video: MP4, MOV, MKV, WebM (we strip the audio track). Up to 2 GB per file. Or paste a URL from Zoom, Google Drive, Dropbox, S3, or any direct link.

Is there an API?

Yes — REST and a JSON streaming endpoint. Same pricing as the dashboard, 100 req/min on Pro, custom limits on Enterprise. Word-level timestamps and confidence scores included by default.

Ready when you are.

Drop a file at the top of this page. We'll handle the rest.

No credit card. First 30 minutes free.

Drop to transcribe

Release anywhere on the page

Audio in.Transcript out. That's it.