Two Approaches to Voice Input
Every voice input tool makes a fundamental design decision: when does the microphone listen?
The two dominant models are push-to-talk (microphone active only while a button is held) and always-on (microphone continuously listening, typically using wake word or start/stop commands). Each approach has different implications for privacy, accuracy, workflow integration, and resource usage.
The choice is not just a UX preference — it reflects fundamentally different assumptions about how voice input fits into a working environment.
Push-to-Talk: Deliberate and Bounded
In push-to-talk dictation, you hold a hotkey to activate the microphone, speak your content, and release the key when done. The microphone is inactive at all other times.
Privacy: This is the strongest privacy guarantee available in voice input. The application can only capture audio while the hotkey is physically held. There is no background listening, no accidental capture of private conversations, and no question of whether audio from an unintended moment was processed. For work environments where colleagues, clients, or sensitive information are often audible, this matters.
Accuracy: Push-to-talk generally produces better accuracy because the audio segment is clean and bounded. The model receives exactly one utterance — from hotkey press to hotkey release — with no need to detect speech boundaries from ambient noise. There is no question of whether background conversation was intended as input.
Workflow: The push-to-talk gesture is explicit and intentional. You prepare what you want to say, press the key, speak, and release. This matches the mental model of "I am now writing" and "I am now done writing." It fits naturally alongside keyboard and mouse use because it does not require hands-free conditions.
Battery and resources: The microphone is idle when not actively dictating. CPU and network activity occur only during dictation sessions.
Limitations: Every dictation requires a deliberate action. Continuous, hands-free dictation — common in medical transcription while a doctor's hands are occupied, for example — is not the natural mode for push-to-talk.
Always-On Dictation: Continuous and Hands-Free
Always-on (or continuous) dictation uses voice activity detection to automatically identify when you are speaking and process that audio. Apple Dictation when running continuously, Google Voice Typing on Android, and hands-free accessibility tools typically work this way.
Privacy: Always-on listening requires ongoing microphone access. The tool must process audio continuously to detect when you start speaking. Even with good local processing, there is inherent exposure: any conversation near your microphone could be captured, even if not intended as input. For most enterprise environments and shared spaces, this is a real concern.
Accuracy: Variable. The model must distinguish between intended dictation and ambient speech — a conversation with a colleague, a video playing in the background, or someone speaking nearby. False activations and missed start points add noise to the output.
Workflow: Better for hands-free scenarios. Medical professionals using dictation while examining patients, workers who need both hands occupied, and users with mobility impairments that make holding a key impractical all benefit from continuous dictation.
Battery and resources: Continuous microphone access with ongoing voice activity detection consumes meaningfully more battery and processing power than push-to-talk.
Limitations: Not well-suited for shared or open-plan office environments. False activations create noise. The continuous "conversation" with the tool can feel unnatural in contexts where you are switching frequently between voice and typed input.
The Wake Word Model
A third approach uses a wake word ("Hey [product]") to start listening and a stop command or silence timeout to end a session. This is the model used by Siri, Alexa, and Google Assistant. For desktop dictation, it is rarely used because the wake word becomes friction in high-frequency use cases.
Impact on Output Quality
Beyond raw transcription accuracy, the activation model affects the quality of AI enrichment:
Push-to-talk advantage: The AI receives exactly one bounded utterance. The enrichment model processes a complete, intentional statement. There is no noise from unintended speech, and the model does not need to handle boundary detection — the user's hotkey release defines the segment.
Always-on challenge: Enrichment models receive audio segments that may include false starts, ambient speech, and unclear boundaries. This makes the AI's job harder and can result in artifacts in the formatted output.
Telvr's Design Choice
Telvr is built entirely around push-to-talk. This was a deliberate choice based on two convictions:
First, privacy matters in professional environments. A tool designed for desktop productivity — where sensitive conversations happen — should give users absolute control over when the microphone is active. Push-to-talk provides that control without configuration.
Second, the explicitness of push-to-talk produces better output. Users who press a hotkey to dictate tend to compose their thought before speaking, rather than thinking out loud and expecting the AI to extract meaning from a stream of consciousness. The resulting input is more coherent, and the AI enrichment output is correspondingly better.
Which Approach Is Right for You
Choose push-to-talk if:
- You work in a shared office or open-plan environment
- Privacy is a concern (calls, sensitive conversations, confidential information nearby)
- You switch frequently between typing and voice input
- You want explicit control over every dictation session
- You are using voice to replace typing in specific moments, not for continuous hands-free use
Choose always-on if:
- You need fully hands-free operation (medical procedures, physical work)
- You work in a private, quiet environment
- You are dictating long continuous passages without needing to interact with the computer
Choose wake word if:
- You are using a voice assistant rather than a dictation tool
- You need ambient activation without a physical button
For the majority of knowledge workers who want to use voice input as a keyboard supplement — writing emails, documentation, messages, and notes while at a desk — push-to-talk is the better fit. The explicit, bounded activation matches how desk work actually happens: intermittent bursts of text creation, not continuous monologue.