Press a hotkey anywhere on your system, speak naturally, and Telvr transcribes your voice in real-time using Whisper. The finished text is automatically inserted at your cursor position — no copy-paste, no app switching.

Which languages are supported?

Telvr supports 50+ languages via OpenAI's Whisper large-v3 model. Language detection is automatic — just speak in your preferred language and Telvr handles the rest.

Do I need a subscription?

No. Telvr uses a pay-as-you-go model: EUR 3 per month infrastructure fee plus EUR 0.03 per minute of usage. No lock-in, no auto-renewal. You top up your balance and use it at your own pace.

Does Telvr work offline?

Currently, Telvr requires an internet connection for cloud-based transcription via Groq. A Community Edition with local processing using your own API key is planned for the future.

Which apps are supported?

Telvr works system-wide — it inserts text at the cursor position in any application. Email clients, chat apps, code editors, browsers, word processors — if you can type in it, Telvr works there.

All data is transmitted via TLS encryption. Audio recordings are not permanently stored after transcription. Groq processes your audio under a Data Processing Agreement (DPA). We do not sell or share your data.

← Blog2026-02-25

Speech-to-Text for Mac: Every Option Compared (2026)

Voice Input on macOS in 2026

macOS has always had strong voice input foundations. Apple introduced server-side Dictation back with OS X Mountain Lion, and the Mac's tight hardware-software integration means even third-party tools can hook deeply into the system. In 2026, Mac users have more voice input options than ever — including tools that would have seemed like science fiction five years ago.

The challenge is knowing which option actually fits your workflow. This comparison covers every relevant option for Mac, with honest assessments of where each one wins and where it falls short.

Apple Dictation (Built-in)

Apple Dictation is the first option to evaluate because it costs nothing and requires no installation. Activate it in System Settings under Keyboard, assign a shortcut (default is pressing Fn twice or the Dictation key), and you are ready.

How it works: Short phrases process on-device using Apple's speech model. Longer dictation sessions can optionally use Apple's servers. The output appears in the active text field in real time.

Accuracy: Strong for common English. Handles conversational speech well. Struggles with technical terminology, proper nouns not in Apple's dictionary, and code-adjacent vocabulary.

Formatting: None beyond basic punctuation when you explicitly say commands. No AI enrichment. If you say "um" or "like," those words appear in your text.

Privacy: On-device processing for short phrases is genuinely private. Server processing involves sending audio to Apple.

Best for: Casual dictation in everyday apps, users who do not want to install anything, quick voice input where formatting does not matter.

Telvr

Telvr is a dedicated push-to-talk dictation app for macOS. It installs as a menu bar app and provides system-wide voice input with AI enrichment.

How it works: You hold a configurable hotkey anywhere on your Mac — in any app, in any text field, even in the terminal. Speak your content, release the key, and within about two seconds the processed text appears exactly where your cursor is.

The processing pipeline uses Whisper large-v3 via Groq's inference API for transcription, followed by an AI enrichment step that transforms raw speech into formatted output.

Six enrichment modes:

Raw Transcription: exact speech output, minimally processed
Clean and Correct: removes fillers, fixes grammar, adds punctuation
Professional Email: formats speech as a complete email with subject and greeting
Meeting Notes: structures content into bullet points with decisions and action items
2-3 Sentence Summary: condenses longer speech into a tight summary
Dev Task: structures a development task with context and acceptance criteria

Accuracy: Whisper large-v3 is among the most accurate models available. Combined with the enrichment layer that corrects grammar and removes disfluencies, the output quality is consistently higher than raw transcription tools.

Latency: Under 2 seconds for typical passages. The cloud processing via Groq's optimized inference is fast enough that the delay feels like the tool is "thinking," not buffering.

Language support: 50+ languages with automatic detection. Telvr does not require you to set your language — it identifies it from your speech.

Pricing: EUR 3 per month infrastructure fee plus EUR 0.03 per minute of actual dictation. A 14-day free trial includes EUR 3 starter credit.

Best for: Professionals who want system-wide voice input that produces clean, formatted output without manual editing.

Wispr Flow

Wispr Flow is Telvr's closest competitor on macOS. It takes the same push-to-talk approach and adds AI processing to produce clean output.

Strengths: Polished interface, solid AI output quality, and "flow mode" which handles longer dictation sessions with natural pauses more gracefully.

Pricing: $14 per month, flat rate. This is better for heavy users (30+ minutes per day) and worse for moderate users compared to Telvr's usage-based model.

Limitations: No custom prompt mode. Language support is narrower than Whisper-based tools.

Best for: Mac users who dictate heavily and want a predictable monthly cost.

Whisper (Self-Hosted)

OpenAI's Whisper model is available as an open-source project. With the right tools, you can run it locally on a Mac with Apple Silicon.

How it works: You record audio (using something like sox or a wrapper like whisper-mic), run it through the local Whisper model, and get a transcript. No cloud API required.

Accuracy: Identical to Telvr's transcription quality — same Whisper large-v3 model. The difference is entirely in the pipeline and enrichment layer.

Latency: On Apple Silicon (M2/M3/M4 chips), Whisper large-v3 runs in 3-8 seconds locally. Smaller models (medium, small) run in 1-3 seconds with some accuracy reduction.

Integration: None out of the box. You need to build a custom pipeline to get text into your active application. Several community projects exist (whispering, MacWhisper, etc.) but require setup.

Enrichment: Zero. You get raw transcription. Post-processing requires additional tooling.

Privacy: Fully local. No audio leaves your machine.

Best for: Developers who want full control, privacy-focused users, people building custom workflows.

Dragon for Mac (Discontinued)

Dragon NaturallySpeaking for Mac was discontinued by Nuance in 2023. No current version is available for macOS. If you are looking for Dragon-level accuracy and vocabulary management on Mac, the options are Telvr, Wispr Flow, or self-hosted Whisper.

This is mentioned because many search results still reference Dragon for Mac — it is no longer a viable option for macOS users.

Comparison Table

| Feature | Apple Dictation | Telvr | Wispr Flow | Whisper (local) | |---|---|---|---|---| | System-wide | Yes | Yes | Yes | With custom setup | | AI Enrichment | No | Yes (6 modes) | Yes | No | | Latency | 1-3s | Under 2s | Under 2s | 3-8s | | Language support | ~60 | 50+ (auto-detect) | ~40 | 99 | | Privacy | On-device option | Cloud | Cloud | Fully local | | Price | Free | EUR 3/mo + usage | $14/mo | Free | | Custom prompt | No | Yes | No | No |

Our Recommendation

For most Mac users who want to use voice input as a genuine productivity tool — not just occasional dictation — Telvr is the most complete solution. The combination of system-wide insertion, fast cloud processing, and AI enrichment modes addresses the two reasons voice input normally fails as a workflow tool: you have to switch apps to use it, and the output needs heavy editing.

Choose Apple Dictation if you only need occasional voice input in standard apps and do not want to install anything.

Choose Wispr Flow if you dictate heavily every day and prefer a flat monthly fee.

Choose local Whisper if privacy is non-negotiable and you are comfortable building a custom pipeline.

The key insight is that raw accuracy, while important, is not the differentiating factor in 2026. Whisper large-v3, available via multiple products, is extremely accurate. The differentiator is what happens to the text after transcription — whether you get raw speech output or formatted, usable text.