MetaWhisp converts your voice to text entirely on your Mac. No internet. No cloud servers. No data collection. Powered by WhisperKit and Apple's Neural Engine.
Most voice-to-text tools record your speech, send the audio to a remote server, process it in the cloud, and return the text. Your voice data travels across the internet, gets stored on third-party infrastructure, and is often used to train models or improve services.
On-device transcription eliminates every step of that. The speech recognition model lives on your Mac. Audio is captured, processed in memory, converted to text, and discarded — all within your machine. The audio never touches a network interface. There is no server to send it to.
This is not a "local cache" that syncs later. It is not a "privacy mode" that still phones home. The entire inference pipeline — from raw audio waveform to final text output — runs locally on Apple Silicon hardware. You can unplug your Ethernet, turn off Wi-Fi, and MetaWhisp works exactly the same.
Three components work together to turn your speech into text in under a second.
When you press the hotkey, MetaWhisp begins recording from your Mac's microphone using Apple's AVFoundation framework. The raw audio is captured at 16kHz mono — the format Whisper expects. A voice activity detector (VAD) identifies when you start and stop speaking, so silence is trimmed automatically. The audio buffer lives in RAM and is never written to disk.
WhisperKit is an open-source Swift framework from Argmax that optimizes OpenAI's Whisper model for Apple hardware. It converts the Whisper architecture into Core ML format, which means inference runs directly on the Apple Neural Engine — the dedicated machine learning accelerator built into every Apple Silicon chip. The Neural Engine handles up to 15.8 trillion operations per second on M3, which is why transcription finishes in under a second even with the large-v3-turbo model.
Once inference completes, the text is placed on the system clipboard and automatically pasted into the active application using macOS accessibility APIs. The audio buffer is released from memory. The entire pipeline — capture, preprocess, infer, paste — typically completes in 500-900ms for a 10-second recording. You can use the text in any app: Slack, VS Code, Terminal, Notes, Safari, or any text field on your Mac.
Running transcription locally is not just a privacy feature. It changes how the tool performs in practice.
A direct comparison of the two approaches across the dimensions that matter most.
| Dimension | On-Device (MetaWhisp) | Cloud-Based |
|---|---|---|
| Privacy | Audio never leaves your Mac | Audio sent to remote servers |
| Works offline | Yes, always | No, requires internet |
| Latency | <1 second consistent | 1-5 seconds, variable |
| Accuracy | 5.7% WER (Whisper large-v3-turbo) | 4-8% WER (varies by service) |
| Languages | 30+ with auto-detection | 50-100+ (varies) |
| Uptime | 100% (no server dependency) | 99.5-99.9% (service outages) |
| Cost | Free, unlimited | $0.006-0.024/min or subscription |
| Data retention | None — audio discarded instantly | Varies — often stored 30+ days |
| Hardware requirement | Apple Silicon Mac | Any device with a browser |
| Speaker diarization | Single speaker | Multi-speaker identification |
The specifics for those who want to understand exactly what is running on their hardware.
| Specification | Details |
|---|---|
| Model | OpenAI Whisper large-v3-turbo (distilled) |
| Framework | WhisperKit (Swift, Core ML) |
| Inference engine | Apple Neural Engine via Core ML |
| Model size | ~809 MB (downloaded once on first launch) |
| App binary size | 7.5 MB |
| Transcription latency | <1 second for 10-second recordings |
| Audio format | 16kHz mono PCM (converted from input) |
| Supported languages | 30+ including English, Spanish, Chinese, Japanese, German, French, Russian, Korean, Arabic, Hindi |
| Language detection | Automatic (built into model) |
| CPU usage at idle | ~2% |
| Minimum macOS | macOS 14 Sonoma |
| Required hardware | Apple Silicon (M1, M2, M3, M4 or later) |
| Word error rate | 5.7% (Whisper large-v3-turbo benchmark) |
Yes. MetaWhisp runs entirely on your Mac using Apple's Neural Engine and the WhisperKit framework. No internet connection is needed for transcription. The only feature that requires internet is the optional AI post-processing (Correct and Rewrite modes), which uses the OpenAI API. The core voice-to-text functionality is 100% offline.
MetaWhisp uses OpenAI's Whisper large-v3-turbo model, which achieves a 5.7% word error rate on standard benchmarks. For comparison, Google Cloud Speech-to-Text and Amazon Transcribe typically achieve 4-8% WER depending on the audio conditions. In practice, on-device accuracy is comparable to cloud for clear single-speaker dictation. Cloud services may have an edge in noisy multi-speaker environments where server-side noise reduction helps.
You need a Mac with Apple Silicon (M1, M2, M3, M4, or later) running macOS 14 Sonoma or later. The Neural Engine in Apple Silicon chips is what makes real-time local transcription possible — it provides dedicated hardware for ML inference that is both fast and power-efficient. Intel Macs are not supported because they lack a Neural Engine capable of running the Whisper model at real-time speeds.
No. Audio is captured into a RAM buffer, processed by the on-device model, and the buffer is released as soon as transcription completes. Nothing is written to disk, sent over the network, or retained in any form. MetaWhisp has no servers, no user accounts, no analytics, and no telemetry. You can verify this by monitoring network activity with Little Snitch or running the app in airplane mode. Read the full privacy policy for details.
Yes. MetaWhisp works system-wide via a global hotkey (Right Option key by default). The transcribed text is automatically pasted into the active application — wherever your cursor is. This includes Slack, VS Code, Terminal, Notes, Safari, Chrome, Mail, Pages, and any other app that accepts text input. It also works in full-screen applications. See the complete dictation guide for setup instructions.
Download MetaWhisp and experience offline, private voice-to-text on your Mac. No account, no subscription, no data collection.
Download for macOSmacOS 14+ · Apple Silicon · Free