Private Dictation Is a Philosophy
When you speak, your words should stay on your machine. No cloud, no accounts, no telemetry. Steno was built so your thoughts never leave your Mac.
When you dictate into most speech-to-text products today, here's what happens: your voice is streamed to a server somewhere, processed by a model running on someone else's hardware, and returned as text. Along the way, it may be logged, analyzed, used to train future models, or stored indefinitely. You don't know. You can't know. The privacy policy probably says they can.
This is the default. It's so normal that most people don't think about it. But voice is different from text. Your speaking voice contains not just your words but your tone, your hesitation patterns, your accent, your emotional state. It's biometric data dressed as convenience.
Steno was built on a different premise: your voice should never leave your machine.
When you dictate with Steno, the entire pipeline runs locally on Apple Silicon. The Whisper model that transcribes your speech runs on your Mac's Neural Engine. The Qwen model that optionally refines your prose runs on the same chip. No network call. No server. No account. No analytics. No telemetry. Just your voice, your machine, and text that appears in your cursor.
This isn't just a privacy feature. It's a philosophy about what software should be.
The argument for cloud processing is always the same: it lets us run bigger models, improve accuracy over time, and add features that require server-side compute. These are real benefits. But they come at a cost that isn't priced into the product. The cost is the loss of sovereignty over your own words.
When you send your voice to a cloud service, you're not just using a tool. You're entering a relationship. You're trusting a company not to misuse your data today, and you're trusting that their policies won't change tomorrow. You're trusting that they won't get acquired by someone with different values. You're trusting that their security is good enough.
These are reasonable bets for some kinds of software. They are not reasonable bets for software that listens to you think.
Dictation is intimate. It captures thoughts in their raw form — before you've edited them, before you've decided what to share and what to keep private. The hesitation. The false starts. The ideas you speak aloud and immediately reconsider. This isn't data that should live on someone else's server.
Building Steno entirely locally required making hard trade-offs. The models we bundle with the app are ~3.4 GB. The refinement step runs a 4-bit quantized model that, while impressive, is not as capable as a cloud-hosted frontier model. Every feature we add has to work within the thermal and memory constraints of a consumer Mac.
These are real constraints. But they're also a design discipline. They force us to ask: what does this feature actually need to do? Can we achieve it with less? Is the user better served by a simpler, local implementation than a more powerful, cloud-dependent one?
The answer, much of the time, is yes.
Private dictation isn't just about keeping your words safe. It's about building software that respects a fundamental boundary: your thoughts are yours. When you speak them aloud, the tool that captures them should be an extension of your machine, not a portal to someone else's.
This is the philosophy behind Steno. Not privacy as a feature checkbox. Privacy as the starting point from which everything else follows.