How to Transcribe Zoom Meetings on Mac Without Joining a Bot

You are in three Zoom calls today, two tomorrow, and at some point this week one of them is the one you actually wanted notes for — the pricing review, the customer interview, the board prep. You opened Otter once. Fireflies sent you a calendar invite from a bot named “Fred.” Granola politely asked to join. None of those are wrong. They are just very visible.

There is a reason a lot of people would rather not have an attendee in their meeting whose only job is to listen, transcribe, and ship the audio off to a third-party server. Some of those reasons are about privacy. Some are about the awkwardness of explaining to a customer why their words are being recorded by a Zoom participant named “Otter.ai Notetaker.” Some are about a compliance review.

This post walks through the alternative: how to transcribe a Zoom meeting on your Mac without any bot in the call, with the audio never leaving your laptop, using Apple Intelligence and Dictanta. If you have used Otter or Granola before and want to know what changes, the short answer is: nothing visible to the other side of the call.

What “without a bot” actually means

When you connect Otter, Granola, Fireflies, Fellow, Read.ai, or any of the cloud-bot transcription services to your Zoom calendar, what they do is roughly this:

Watch your calendar for a meeting URL.
Spin up a participant — a virtual attendee — and have it join the Zoom call as a user.
That participant streams the meeting audio to the vendor’s servers in real time.
A cloud ASR system (Whisper, AssemblyAI, Deepgram, or proprietary) transcribes it.
A cloud LLM (GPT-4o, Claude, Gemini) summarizes the transcript.
The transcript and summary land in your dashboard.

That works. It is also exactly six places your meeting audio gets handled by software other than Zoom itself. And every other participant in the meeting sees the bot show up in the participant list. Some hosts disable that for executive calls. Some legal teams flat-out forbid it. Some customers ask, on the call, “who’s Fred?”

The alternative is to capture the audio your Mac is already hearing — the audio that comes out of Zoom and into your speakers — and transcribe it locally. No bot. No second participant. No data leaves the machine.

On macOS 26 (Tahoe), Apple shipped two APIs that finally make this trivial:

ScreenCaptureKit with audio-only capture. Lets a sandboxed app subscribe to the system audio output of a specific running process — like Zoom, or Teams, or Google Meet in Safari.
SpeechAnalyzer, the new on-device automatic speech recognition framework introduced at WWDC 2025. Per Apple’s published benchmarks it transcribes English audio about 55% faster than Whisper v3 Turbo on the same Apple silicon. Critically, it runs entirely on the Neural Engine. No network.

Combined, they let an app like Dictanta do the bot’s job — without being a bot.

The end-to-end flow

Here is what actually happens when you record a Zoom meeting with Dictanta on a Mac running macOS 26.

1. Start the meeting normally

Open Zoom. Join the call. Don’t enable Zoom’s built-in cloud recording (you can if you want; they are independent). Don’t invite any third-party participant.

Dictanta lives in your Mac menu bar. The default global hotkey is ⇧⌘R. Pressing it begins recording. The menu bar icon turns coral and pulses; a small live caption strip appears below the menu bar showing what is being said in real time.

The first time you record system audio, macOS prompts you for screen recording permission (audio capture is gated by the same TCC permission as screen recording — that’s the API contract, even though no video is captured). You grant it once.

3. Dictanta auto-detects the meeting app

Because the recording is system-audio scoped, Dictanta sees which app is producing the audio. If Zoom is running, the recording is labeled “Zoom meeting.” Same for Teams, Google Meet, Webex. If you’re just on a phone call routed through your Mac, it labels accordingly.

You can also pick a source manually: mic only, system audio only, or a mix of both. The mix is useful when you are presenting and want both your spoken commentary and the audio playing on screen.

4. Live partial captions stream during the call

Two things are useful in-meeting, not just after:

A captions strip in your menu bar (you can hide it with ⌘⇧K).
A live transcript window you can open with ⌥⌘T.

These are not transcription previews to be edited later. They are SpeechAnalyzer’s real-time partial hypotheses, the same data path that powers iOS 26’s Live Captions. Latency is ~300ms on M-series Macs.

If the meeting is in a language other than English, Dictanta uses the appropriate locale’s SpeechAnalyzer model. Currently supported in v1.0: English, Spanish, French, German, Japanese, Mandarin. More follow as Apple ships them.

5. Stop recording, get a transcript and summary

When you end the meeting (or click “stop” in Dictanta), three things happen, all on-device:

The full transcript is finalized with timestamps. Each segment is tap-to-seek.
Apple’s Foundation Models LLM generates a summary: TL;DR, decisions, action items (with owners and rough due-date guesses), and open questions.
Every summary bullet is anchored to the audio span it came from. Hovering a bullet highlights the corresponding waveform segment; clicking scrubs the audio to that moment and highlights the transcript line.

The last point is the one that matters most. If you don’t trust an AI summary, you can verify any bullet in one click. The audio is the source of truth — the LLM’s job is just to organize it.

6. Export, or just keep it on the Mac

By default the transcript and summary sync to your other Apple devices via CloudKit (transcripts only — audio never syncs unless you explicitly enable iCloud Drive backup).

For export, the v1.0 options are:

Markdown — clean structure, drops straight into Notion, Obsidian, Apple Notes, Bear, or any markdown editor.
JSON — for tooling integrations.
Plain text — for the simplest case.

DOCX, PDF, and SRT export ship in v1.1.

The audio file itself auto-deletes after seven days by default (you can change this in Settings → Privacy). The transcript and summary stay until you delete them.

What this lets you skip

If you have been using a cloud-bot transcription service, here is the short list of things that go away:

The visible bot in the participant list. Customers don’t ask who Fred is anymore.
The “we’re going to record this call for note-taking” disclosure — you can still say it for ethical reasons, but it isn’t required by any third-party ToS because no third party is involved.
The vendor’s data-retention policy. Your call audio is on your Mac, scoped to your user account, encrypted by FileVault (if you have it on, which you should). When you delete it, it’s deleted.
The pricing meter. Otter’s free plan caps at 300 minutes per month. Granola Free is 25 meetings/month then $18/mo. Fireflies starts at $18/mo with limits. Dictanta is free for your first three meetings, then $9.99/mo, $79.99/yr, or $149.99 lifetime — no per-minute meter at any tier.

Where this approach has limits

To be straight: there are a few things the bot approach does that the no-bot approach does not.

Bots tag speakers automatically because they see who is speaking in Zoom’s participant list. Dictanta v1.0 does not ship speaker diarization (separating who said what) — Apple’s on-device speaker-ID for system audio is not yet production-grade on short captures. Diarization is coming in v1.1.
Bots can join meetings you are not in. If your assistant is in a meeting you skipped, Otter can still capture it. Dictanta requires you to be at the Mac running the recording.
Bots can integrate with Slack, HubSpot, Salesforce out of the box. Dictanta exports to Markdown and JSON; you wire your own webhook if you want a CRM push.

If those are important to you, the cloud-bot category is the right answer. If you have ever been the executive on a customer call wishing the third-party note-taker wasn’t there — or if your legal team has ever asked where the recordings live — the no-bot path is the right answer.

A note on macOS permissions

The first time Dictanta records, you grant it screen recording permission. The system shows a sheet that says it lets the app capture your screen and system audio. Dictanta does not capture your screen — it asks for the permission because Apple’s ScreenCaptureKit API gates audio capture behind the same TCC entitlement as screen capture, even when no video frames are requested. (This is a known Apple API design choice; same for every Mac transcription app that uses ScreenCaptureKit.)

You can verify Dictanta only captures audio by inspecting the app sandbox in macOS Privacy & Security; it does not request NSScreenRecordingUsageDescription for video frames, only for audio. There is no separate “audio only” TCC slot — that’s an Apple limitation, not a Dictanta one.

If your IT department has a managed Mac and disables screen recording entitlements, Dictanta falls back to mic-only recording (so you can still record meetings, just only what your mic hears — useful for an in-person meeting, less useful for remote calls).

Trying it in your next meeting

Practical workflow for someone trying this for the first time on a real call:

Install Dictanta from the Mac App Store on your work Mac. The download is small (~4 MB plus the SpeechAnalyzer language model, which is on the Mac already as part of macOS 26).
Open Dictanta once. Grant microphone and screen recording permissions when prompted. Configure the global hotkey (default ⇧⌘R) and pick which menu-bar caption mode you want.
In your next Zoom call, press the hotkey when the meeting starts. Forget about it.
After the call, open Dictanta. The recording will be on the meetings list. Tap into it to see the transcript and summary.
If you want it in Notion or Obsidian, click Export → Markdown.

That’s the whole flow. No vendor onboarding, no calendar OAuth, no bot tested in a sample call. The free tier covers three meetings, which is enough to see whether the summary quality is good enough for what you need. If it is, upgrade; if not, you spent zero money.

Bottom line

For Mac users who do a lot of Zoom (or Teams, or Google Meet, or Webex) calls and don’t want to send their meeting audio through someone else’s pipeline, the right tool is the one that captures the audio your Mac is already hearing, transcribes it on-device with Apple’s SpeechAnalyzer, and summarizes it on-device with Apple’s Foundation Models. No bot in the call, no cloud upload, no monthly meter.

That tool is Dictanta. It works on macOS 26, iOS 26, iPadOS 26, and visionOS 26. The Mac version is the most feature-complete because system-audio capture only exists on the Mac.

If you’re shopping for a transcription tool right now, the question isn’t whether the cloud options are good — they are. The question is whether you want the cloud at all for this particular workflow. If the answer is no, you finally have an option that respects that.