Guide

Transcribe your footage

Turn it on, pick a model and language, and let the queue work in the background

Transcription is off by default because not every project needs it. When you turn it on, every eligible clip gets queued and the grid fills up with searchable text in the background.

Turn it on

Settings → Transcripts → "Enable transcription for Sources." There's a separate toggle if you also want FCP libraries transcribed.

It runs entirely on your Mac. No cloud, no per-minute fees, no upload, footage doesn't leave the room. That's the whole reason it's local in the first place.

A note we owe you: long footage takes real time. M1 and newer is comfortable for the default model. Intel Macs are slower and tend to be happier picking single clips at a time and dropping to the Fast model.

Pick a model

Settings → Transcripts → Model. Four options, in order of size and accuracy:

  • Tiny (75 MB). Fastest, lowest quality. English only. Useful for a rough pass on a giant pile of footage when you just need to know what's roughly in each clip.
  • Fast (150 MB). About 10× realtime. English only. May miss proper names. Good for first-pass searching.
  • Standard (Recommended) (480 MB). About 5× realtime. Accurate. English only. The default, and the right pick for most projects.
  • High quality (3 GB). About 1.5× realtime. Highest accuracy. The only multilingual model.

If your footage is in English, stay on Standard. It's the sweet spot of speed and accuracy. The first three models are English-only, so the language picker below them is effectively ignored.

When to use High quality

Two reasons to go to High quality:

  1. Your footage is not in English. This is the only model that supports other languages.
  2. Accuracy matters more than time. It's the most accurate of the four, by enough to notice on tricky audio (heavy accents, cross-talk, technical jargon).

The cost: it's a 3 GB download on first use, and it's slow. About 1.5× realtime, which means an hour of footage takes around 40 minutes to transcribe (compared to about 12 minutes on Standard). Plan accordingly. If you're transcribing a multi-hour archive overnight, this is fine. If you need words in five minutes, it isn't.

Pick a language

Settings → Transcripts → Language. Auto-detect (default), or one of: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Mandarin Chinese.

The language picker only matters when you're on the High quality model. The other three are English-only and ignore this setting.

If you're on High quality and your footage is in one specific language, picking it explicitly is more reliable than asking the model to guess on a quiet opening clip. Auto-detect works most of the time but sometimes flips to the wrong language on the first sentence and never recovers.

Settings → Transcripts with the High quality model selected and Spanish picked as the language.

The AC-only gate

"Only transcribe while plugged in" is on by default. Transcription is a hot job, and unplugging a laptop in the middle of it cooks battery life for nothing. The queue pauses on battery and resumes when you plug back in.

If you want it to run on battery anyway, untick the toggle. We don't stop you, we just don't make it the default.

Three ways to start the queue

  1. Boot walk. When the app launches with transcription enabled, it queues every eligible audio and video asset that doesn't already have a transcript.
  2. Per source. Click "Transcribe" on a source row in the sidebar. Queues just that source.
  3. Per asset. Click "Transcribe" in the inspector for a single clip.

The status bar at the bottom shows progress as done / total, with Pause, Resume, and Cancel. Cancelling persists across launches, so you don't get a midnight surprise. Settings → Transcripts shows "paused 23 minutes ago" so you remember why nothing's happening.

If a clip can't be transcribed, the inspector shows a warning with a reason and a Regenerate button. The usual suspects: no audio, audio too short, audio format unreadable, or the model hadn't finished downloading on first launch.

Set it once, leave it running, come back to a project where every word is searchable. That's the deal.