LLM Dictation

Market Report and Entry Strategy. Two adjacent markets, one shared opening — a competitive and business-model analysis of text expansion and voice dictation, with build options and a defensible entry strategy for a new local-first product.

19 June 202618 min readLocal LLM synthesis (multi-model)

market-analysis
dictation
text-expansion
local-first
business-models

Two adjacent markets, one shared opening. The leaders are split between one-time desktop tools and venture-funded cloud subscriptions — and the same gap runs through both: a local-first, privacy-first product that charges once for the core and monetises the AI layer separately.

The strategic gap. Subscription fatigue plus the maturing of on-device models has opened a clear lane for a local-first, privacy-first product that charges a one-time or low-cost licence for the core and monetises the AI layer separately. The single most interesting unclaimed position: nobody owns both expansion and dictation in one local app.

Summary

There are two adjacent markets here, and they behave very differently.

Text expansion is a mature, slow-growing utility category. The leaders are either old-school one-time-purchase desktop tools (Typinator) or cloud subscription team tools (TextExpander), with a strong free open-source floor (Espanso) that caps how much anyone can charge solo users. Margins on the paid side are healthy because there is little or no server cost, but the ceiling on willingness-to-pay is low and the category is being quietly absorbed into launchers (Raycast, Alfred) and OS-level features.

Dictation / voice-to-text is the opposite: a fast-growing, heavily funded category. Wispr Flow has raised roughly USD 81M and is valued between USD 700M and a reported ~USD 2B, on the back of a cloud-only, subscription model. But the entire premise of cloud dictation is under pressure, because local speech models (Whisper, NVIDIA Parakeet, Apple's own engine) now match or beat cloud accuracy on consumer hardware, at near-zero marginal cost and with perfect privacy.

The strategic gap is the same in both markets: subscription fatigue plus the maturing of on-device models has opened a clear lane for a local-first, privacy-first product that charges a one-time or low-cost licence for the core and monetises the AI layer separately. That lane is already being colonised (VoiceInk, Spokenly, Superwhisper, Dictato), so speed and product quality matter — but it is structurally the safest place to compete against a VC-funded cloud incumbent you cannot out-spend.

Market sizing and direction

Market	2026 size (approx.)	Forecast	CAGR	Notes
AI speech-to-text tools	USD 3.87B	~USD 16.4B by 2035	~17.4%	The most relevant TAM for dictation products
Cloud dictation solutions	USD 8.86B	~USD 22.2B by 2034	~12.2%	Skews enterprise/healthcare/legal
Broader "voice AI"	~USD 22.5B	—	~34–35%	Cited by VC commentary around Wispr's raise
Medical transcription software	USD 3.31B	~USD 6.9B by 2031	~15.9%	Dominated by Nuance/Microsoft (Dragon)
Text expansion	No clean standalone figure	—	Low single digit	A feature/utility category, not a tracked market

What the numbers tell us: dictation is where the growth and the capital are. Text expansion is a profitable niche but not a venture-scale market on its own. A sensible entry treats expansion as a feature that improves retention and dictation as the growth engine.

Three structural trends to build around

Local models have caught up. On-device Whisper and Parakeet now reach roughly 90–97% accuracy on professional English speech; more than 20% of enterprise speech-to-text vendors now offer fully on-device tiers (up from under 5% in 2022). The historic reason to go cloud — accuracy — is largely gone for English and major European languages.
Subscription fatigue is real and quantified. Around 41% of consumers report subscription fatigue, with the average person unknowingly spending ~USD 219/month on recurring services. Lifetime and one-time deals (Dictato, BetterDictation, VoiceInk, MacWhisper) are gaining traction precisely because of this.
The category is being commoditised from above and below. Above: ChatGPT and Claude voice modes, plus Apple's built-in SpeechAnalyzer (macOS 26+) and Windows Speech Recognition, compete for the same "input moment" for free. Below: open-source engines mean anyone can build a basic dictation app in a weekend. Differentiation has to come from workflow, polish, and trust — not raw transcription.

Segment A — Text expansion: Typinator and competitors

The leader: Typinator (Ergonis Software, Austria)

Model: Predominantly one-time purchase, local-first, privacy-first. All snippets stored locally by default; no cloud dependency. Roughly 100,000+ users over 20+ years.
Pricing (2026): Basic one-time licence in the ~USD 29.99–49.99 range (Mac only); Advanced ~USD 29.99/year (adds iOS); Business ~USD 59.99/year per user; Enterprise on request. Free updates within a major version; paid upgrades for new majors.
Recent moves: Version 10 (March 2026) added an iOS companion (custom keyboard, iCloud sync), a redesigned interface, link-based team snippet sharing, and Apple Intelligence integration for AI-assisted snippet creation. A Windows version is signposted as "coming soon".
Positioning: Speed and reliability (zero-lag local expansion, low CPU/RAM), regex and advanced formatting for power users, and data sovereignty for compliance-sensitive teams (Visa's compliance team is a cited customer).
Weakness: Historically Mac-only; pricing has become confusing as it straddles one-time-licence and subscription (iOS features require an active subscription).

Competitor business models

Product	Vendor	Model	Customer cost	Platforms	Differentiator
TextExpander	Smile	Subscription only (cloud + team)	Individual ~USD 3.33/mo (~USD 39.96/yr); Business ~USD 99.96/yr; Growth ~USD 129.96/yr; no free plan	Mac, Windows, Chrome, iOS, Android	The mature incumbent: cloud sync, team sharing, permissions, analytics, compliance
Espanso	Solo dev + small team	Free, open source (GPL-3), donation-funded	Free	macOS, Windows, Linux	Rust, YAML config, scriptable; 13,000+ GitHub stars. Sets the price floor at zero
Text Blaze	Text Blaze	Freemium (browser-first)	Generous free tier; Business ~USD 83.88/yr	Chrome-focused; Mac/Windows apps	Dynamic templates, conditional logic, API integrations; strong free tier
Typedesk	Typedesk	Team-first subscription	~USD 60/yr per user (Premium)	Windows, Mac, Chrome, Firefox, web	Team sharing built in (not an add-on); cross-platform from day one
aText	TrankyNguyen	Cheap one-time	Very low one-time fee	macOS	Bring-your-own cloud sync (iCloud/Dropbox/Drive/OneDrive); budget pick
Raycast	Raycast	Freemium launcher (snippets as a feature)	Free; Pro ~USD 8/mo for AI	macOS (Windows in progress)	Snippets bundled into a launcher — the "absorb the category" threat

Read on the expansion market

Two viable paid models exist: (a) one-time/upgrade licence for solo Mac power users (Typinator, aText), and (b) per-seat cloud subscription for teams (TextExpander, Typedesk). Solo subscription struggles because Espanso is free and good.
The money is in teams. Shared snippet libraries, permissions, analytics and "update once, push to everyone" are what justify recurring revenue. Solo users churn or defect to free.
The category is being eaten by launchers and OS features. A pure text expander is a hard standalone business in 2026. It is a much better attach feature to something stickier.

Segment B — Dictation: Wispr Flow and competitors

The leader: Wispr Flow (Wispr AI, San Francisco)

Model: Cloud-only, subscription. Founded ~2021–2022 by ex-Apple/Meta engineers. The product runs as a background app activated by a hotkey; speech is uploaded, transcribed, cleaned up by an LLM layer (filler-word removal, punctuation, backtracking correction, per-app tone) and pasted into any text field.
Pricing (2026): Free (capped at ~2,000 words/week); Pro ~USD 15/mo or ~USD 144/yr; Teams (per-seat); Enterprise custom. 14-day Pro trial, no card. No lifetime tier, no on-device mode at any price.
Traction & funding: ~USD 81M raised total — USD 30M Series A (Menlo Ventures, mid-2025) plus a USD 25M extension (Notable Capital, Nov 2025) at ~USD 700M post-money; a later source cites a ~USD 2B valuation. Reportedly reached 270 of the Fortune 500, signing ~125 enterprise customers/week, ~100x YoY user growth, ~70% 12-month retention.
Tech & compliance: Runs on AWS (us-east-1) via Baseten, with OpenAI, Anthropic and Cerebras as subprocessors; a fine-tuned Llama model does the cleanup. Claims ~10% error rate vs Whisper's ~27% and Apple's ~47% (internal benchmarks). HIPAA BAA on all plans; SOC 2 Type II and ISO 27001 for enterprise. Opt-in zero-data-retention "Privacy Mode" on Pro.
Weaknesses (the attack surface): Cloud-only — useless offline, every word leaves the device. Screenshot-based context capture worries privacy-sensitive users (Trustpilot ~2.7/5). Windows app is a heavy Electron port (~800MB RAM, freezes target apps). No native iOS keyboard. Most expensive option in the category.

Competitor landscape

Product	Model	Customer cost	Processing	Platforms	Position
Superwhisper	Bootstrapped; freemium + lifetime	Free tier; Pro USD 8.49/mo, USD 84.99/yr, USD 249.99 lifetime	Local-first (Whisper + Parakeet) + cloud BYOK	macOS, Windows, iOS	Power-user favourite; deep custom modes; lifetime cuts the subscription clock
VoiceInk	Open source (GPL-3), one-time	USD 39.99 one-time	100% local (whisper.cpp + Parakeet via FluidAudio); optional cloud cleanup via user key	macOS 14+ (Apple Silicon)	Auditable, cheapest polished offline dictation; dev-friendly
MacWhisper	One-time / lifetime	~EUR 59 lifetime (Pro ~EUR 69); free tier	Local Whisper	macOS	File-transcription workhorse (podcasts, meetings), not live dictation
Spokenly	Free + thin Pro	Free (local + BYOK at zero markup); Pro USD 9.99/mo	Local (Parakeet + Whisper) + BYOK cloud	macOS, iOS, Windows	Free local stack + MCP server for AI coding agents; the "free Superwhisper"
Dictato	One-time lifetime	~EUR 19.99 lifetime	Local (Whisper, Parakeet, Apple Speech, Qwen3-ASR); ~80ms latency	macOS	Cheapest credible lifetime; speed-led
Weesper Neon Flow	Low subscription	EUR 5/mo, unlimited	100% offline, 50+ languages	macOS, Windows	Undercuts everyone on recurring price
Voibe	Subscription + lifetime	USD 59/yr or USD 149 lifetime	100% local; developer mode	macOS	Dev-focused, HIPAA-ready architecture
BetterDictation	One-time lifetime	USD 24 lifetime	Local	macOS	Budget lifetime
OpenWhispr	Free, open source	Free	Local (Whisper + Parakeet)	macOS, Windows, Linux	Community/cross-platform; the open floor
Otter.ai	Per-seat SaaS	Business ~USD 20/mo per user	Cloud	Cross-platform/web	Meeting transcription, not in-app dictation
Dragon (Nuance/Microsoft)	Enterprise one-time + ambient SaaS	~USD 700 one-time (Professional); Dragon Copilot enterprise	Local + cloud ambient	Windows-only now (Mac killed)	Regulated-enterprise incumbent; ~600,000 clinicians; deep EHR lock-in

Read on the dictation market

The category has effectively split into four pricing archetypes:

Cloud premium subscription — Wispr Flow (USD 15/mo). Carries inference costs, needs VC fuel, justified by AI cleanup + cross-platform + enterprise compliance.
Local-first subscription / lifetime — Superwhisper (USD 8.49/mo or USD 249 lifetime), Voibe. Mid-market; lifetime is the hook against subscription fatigue.
Local-first cheap one-time — VoiceInk (USD 39.99), Dictato (EUR 19.99), BetterDictation (USD 24), MacWhisper (EUR 59). Near-zero marginal cost; the fastest-growing lane.
Free open source / free tier — Espanso (expansion), OpenWhispr, Spokenly free. The floor; no direct revenue.

Key insight. The pricing gap between Wispr (USD 360 over two years) and a local lifetime tool (~EUR 20–120 total) is "a product choice, not a technical one" — the underlying engines are the same open models. That gap is the wedge.

How a new entrant could build it

The good news for a builder: the hard part — accurate speech recognition — is now a solved, free, open-source commodity. The work is in latency, integration, polish and trust.

The engine choices (all proven, all local-capable)

Engine	Licence	Languages	Speed on Apple Silicon	Best for
NVIDIA Parakeet (TDT v2/v3)	CC-BY-4.0 (commercial OK)	v2 English-only; v3 ~25 European	~80ms via FluidAudio/CoreML on the Neural Engine; best English WER	Real-time push-to-talk dictation, English + Europe
OpenAI Whisper (large-v3-turbo)	MIT	99	Mature ecosystem (whisper.cpp, WhisperKit/CoreML, faster-whisper, MLX); slower sequential decoder	Multilingual, file transcription, non-European languages
Apple SpeechAnalyzer	Built into macOS 26+	~20	Native Neural Engine, lowest resource use	Zero-setup, low-disk, battery-friendly default
Moonshine	Open weights	Limited	Small, efficient, Apple-Silicon optimised	Lightweight real-time

Practical engine note: Parakeet is fastest only through a CoreML/Neural-Engine path (the open-source Swift SDK FluidAudio, which already powers 20+ shipping apps including VoiceInk and Spokenly). Run naively through MLX it can actually be slower than Whisper-on-CoreML and memory-hungry. So the realistic build is: FluidAudio + Parakeet for English live dictation, WhisperKit + Whisper large-v3-turbo as the multilingual fallback, Apple SpeechAnalyzer as the zero-footprint option. Ship all three and let the app pick.

Build option 1 — A local text expander to take on Typinator

Core engineering: a background keystroke listener that watches for trigger strings and performs in-place replacement system-wide. The hard parts are reliable cross-app injection (accessibility APIs on macOS, SendInput/hooks on Windows), rich-text/templating, regex triggers, and date/clipboard/calculated variables.
Sync & teams: local-first storage by default (the Typinator/Espanso trust position), with optional end-to-end-encrypted sync and a team library layer (the part people actually pay for).
Stack reality: native is strongly preferred for a keyboard-level utility — Swift on macOS, and either native or a Rust core (à la Espanso) for cross-platform. A web stack is the wrong tool for the keystroke layer. TypeScript/Electron is viable only for the settings UI and the team/web dashboard, not the hot path.
Effort: a credible v1 is a small-team, few-months effort. The moat is polish and team features, not novelty.

Build option 2 — A local "super Wispr Flow"

This is the more attractive build because the growth is here and the cloud incumbent has an exploitable weakness (privacy + offline + price).

Core loop: global hotkey → capture mic → local ASR (Parakeet/Whisper via FluidAudio/WhisperKit) → optional local or BYOK-cloud LLM cleanup → paste into the focused field. This is exactly Wispr's loop, minus the forced cloud round-trip.
Where you beat Wispr: offline + private (works on planes, in cafés, in regulated environments); latency (local Parakeet ~80ms feels instant vs Wispr's ~1–2s); price (one-time/lifetime or a fraction of USD 15/mo); resource footprint (a native app trivially beats an 800MB Electron port).
Where Wispr still wins (so plan for it): cross-platform breadth (esp. Android), polished onboarding, team features, and enterprise compliance paperwork. Match these selectively rather than all at once.
Monetise cloud optionally: transcription stays local; "make this an email", "summarise", "translate", "rewrite formally" can run locally on a small model, via the user's own API key (zero inference cost to you), or via a thin managed subscription for non-technical users.
Stack reality: native Swift/Metal for the capture-and-inject hot path; TypeScript is appropriate for the account/dashboard/billing web app and team management. Use existing open SDKs (FluidAudio, WhisperKit) rather than writing inference from scratch.

Build option 3 — The unclaimed wedge: build both in one app

The wedge. No one currently sells a single local app that does both system-wide text expansion and voice dictation, with voice able to trigger snippets and templates ("insert my standard reply", dictate a variable into a template). Expansion drives daily retention and habit; dictation drives the growth narrative and the higher price point. Bundling them is a genuine differentiator that neither Typinator nor Wispr can quickly copy without entering the other's category.

Viable business models — options that won't fail

Ranked by defensibility for a new entrant who cannot out-raise Wispr. Each is paired with why it's hard to kill.

Option 1 — Local-first core + monetised AI layer (recommended primary)

Shape: Charge a one-time or low-cost lifetime licence for the local dictation + expansion core (think USD 39–99 one-time, or ~USD 5–8/mo if subscription). Keep transcription 100% on-device. Monetise the AI cleanup/rewrite layer via BYOK (free to you) plus a thin managed cloud tier for non-technical users.
Why it won't fail: near-zero marginal cost (no inference bill), so you survive on modest volume; directly exploits subscription fatigue and Wispr's privacy/offline gap; proven by VoiceInk, Spokenly, Superwhisper-lifetime, Dictato.
Risk to manage: the lane is crowding fast — you win on polish, latency and the expansion+dictation bundle, not on being first.

Option 2 — Open-core / developer-first, monetise teams

Shape: Open-source the engine and solo app (Espanso/VoiceInk credibility play), build distribution through the developer community, and charge for team sync, shared libraries, admin, MCP/agent integrations and SSO.
Why it won't fail: open source is the cheapest, most trusted distribution channel in this category; teams are where recurring revenue lives; developers dictating into Cursor/VS Code/Claude Code are a hot, underserved segment.
Risk: monetisation is slower; you must be disciplined about what's paid vs free.

Option 3 — Vertical / compliance SaaS (highest ACV)

Shape: Pick a regulated, document-heavy vertical where on-device processing = automatic compliance (legal, professional services, finance, allied health — but avoid head-on clinical, where Nuance/Microsoft Dragon owns the EHR-integrated market). Sell per-seat subscription justified by data sovereignty, BAA/SOC 2, and custom vocabulary.
Why it won't fail: willingness-to-pay is far higher than consumer; on-device is a genuine compliance differentiator vs Wispr's cloud; switching costs and stickiness are high.
Risk: longer sales cycles; you need real compliance investment, not just marketing claims.

Option 4 — The bundle as a productivity suite (the platform play)

Shape: Expansion + dictation + clipboard/snippets in one local app, freemium consumer tier, paid Pro, paid Teams — the position Raycast is circling from the launcher side.
Why it won't fail: combines a sticky daily-use feature (expansion) with a growth feature (dictation); higher retention and ARPU than either alone; hard for single-category incumbents to match.
Risk: scope; resist building everything — ship the two core loops beautifully first.

Models to avoid

Pure cloud subscription, head-to-head with Wispr. You'll carry inference costs and lose the capital war against a USD 81M+ war chest. Only sane if you have a true cost or accuracy edge.
Free with no monetisation path (the pure-Espanso position). Admirable, not a business.
Lifetime-only with no recurring revenue and no cloud margin. Funds initial growth but eventually forces a pivot or closure — several lifetime-only tools are structurally fragile for this reason.

Recommendation

The lowest-risk, highest-leverage entry is a native, local-first macOS app (Apple Silicon) that does both dictation and text expansion, sold as a one-time/low lifetime licence for the local core with an optional thin cloud/AI subscription and a paid Teams tier. Concretely:

Ship the dictation loop first using FluidAudio + Parakeet (English) and WhisperKit + Whisper (multilingual), targeting sub-100ms perceived latency and a footprint that embarrasses Wispr's Electron Windows app.
Add expansion as the retention layer — system-wide snippets/templates, voice-triggerable, local by default.
Monetise the AI cleanup/rewrite layer via BYOK (zero cost to you) plus a managed tier for non-technical users, and charge teams for shared libraries + sync + admin.
Position explicitly against the cloud: "Wispr-grade dictation, fully offline, your audio never leaves your Mac, pay once." That sentence is the entire go-to-market.
Expand to Windows second (where Wispr is weakest) and treat a vertical/compliance SaaS tier as the upsell once the consumer base proves the engine.

This avoids the capital war, exploits the two biggest cracks in the incumbent (privacy/offline and price), rides the two confirmed macro-trends (local models maturing + subscription fatigue), and claims the one position no current competitor holds — expansion + dictation in one local app.

Sources and caveats

Pricing, funding and product facts are drawn from publicly available 2026 reporting, vendor sites and reviews, including: TextExpander blog and pricing; Typedesk and TypeFire comparisons; Timing, MakerStack and ChatGate Typinator coverage; Capterra; Espanso (espanso.org, GitHub) and XDA; Wispr Flow site, pricing and reviews; TechCrunch (Wispr funding); Weesper Neon Flow pricing analyses; Superwhisper, VoiceInk, MacWhisper, Spokenly, Dictato, OpenWhispr and Voibe comparison pages; engine benchmarks from Spokenly, Dictato, Whisper Notes, MacParakeet, Echo, ModelsLab and Arun Baby; and market-sizing reports from Precedence Research, Fortune Business Insights, 360 Market Updates, Mordor Intelligence and Expert Market Research.

Figures are approximate and current as of early-to-mid 2026. A couple of headline numbers vary between sources (Wispr's reported ~USD 2B valuation; exact Typinator tiers) — verify the latest pricing and valuation directly with each vendor before anything goes into a deck or board paper.