2026-05-31 13:25 UTCIn-site rewrite3 min readUpdated: 2026-06-30 13:03 UTC

Vox Dictum, on-device transcription with speaker diarisation and AI summaries

Vox Dictum is a privacy-first transcription app for macOS that runs entirely on-device. It offers high-accuracy speech recognition in 60+ languages, automatic speaker diarisation, AI-powered summaries, audio enhancement, and flexible export options. All data stays on your Mac with zero data collection.

SourceHacker News AIAuthor: mozairr

Vox Dictum — Private AI Transcription for macOS | Cobalt InFX

Cobalt InFX

Innovative Software Solutions

We build intelligent, privacy-first software for professionals who demand precision and control. Every product runs on-device — your data stays yours.

Vox Dictum

Private, on-device transcription for macOS. Import any recording, label speakers, and generate intelligent summaries — all processed locally on your Mac.

🔒

100% on-device

⚡

Apple Silicon

🌍

Transcription in 60+ languages

Download on the Mac App Store

macOS 14.6+ · Apple Silicon (M1 or later)

🎙️

AI Transcription

Import audio or video. AI-powered speech recognition transcribes with high accuracy across 60+ languages. AI summaries available in English and major languages.

👥

Speaker Recognition

Automatically detect and label speakers. Rename a speaker once — all phrases in that file update. With Pro+, the same voice is recognised across multiple recordings.

📝

AI Powered Summary

Generate structured summaries tailored to meetings, interviews, podcasts, and more — with key decisions, action items, and speaker contributions. On-device AI — no cloud, no data sharing.

🔇

Audio Enhancement

Built-in speech enhancement, background noise removal, and silence removal. Better input, better output.

📤

Flexible Export

Export as TXT, Markdown, HTML, or SRT subtitles — all with speaker names. Ready for your workflow.

🛡️

Complete Privacy

Zero data collection. Zero analytics. Zero cloud processing. Your recordings and transcripts never leave your Mac.

Vox Dictum MacWhisper Otter.ai Dragon

On-device processing ✓ ✓ ✗ ✓

Zero data collection ✓ ✓ ✗ ✗

AI Summary ✓ On-device ✗ Cloud ✗

Speaker recognition ✓ On-device ✗ Cloud ✓

Overlap detection ✓ ✗ ✗ ✗

Audio enhancement ✓ ✗ ✗ ✗

Free tier Unlimited ✗ 300 min/mo ✗

Price From £7.99/mo €59 one-time $16.99/mo $699

Pricing

Start free. Upgrade when you need more.

Free

£0

No credit card required

Unlimited transcriptions

Small and Base transcription models

Speaker diarisation and renaming

Transcription in 60+ languages

Audio enhancement suite

Export as TXT, MD, HTML, SRT

Audio Advisor

URL and podcast import

Popular

Pro

£7.99 /month

Everything in Free, plus:

Advanced transcription models

AI Powered Summary

Overlap resolution

Custom vocabulary corrections

Speaker reallocation

Transcript phrase splitting

Bulk transcript export

Pro+

£12.99 /month

Everything in Pro, plus:

Speaker recognition (voice matching)

Consistent labels across batch jobs

Auto-label propagation

Project-level speaker re-run

Annual: £99.99/year — save 36%

Your data stays on your device

🔒 Cobalt InFX products collect no data. Recordings, transcripts, and summaries are processed entirely on your device and are never transmitted to any server.

Vox Dictum — What we process

Vox Dictum processes audio and video recordings to produce text transcripts, speaker labels, and AI-generated summaries. All processing is performed locally on your Mac using on-device machine learning models. No audio, text, or metadata is sent to Cobalt InFX, Apple, or any third party during processing.

What we collect

Nothing. Vox Dictum does not collect personal data, usage analytics, crash reports, telemetry, or any other information from your device. The app contains no analytics frameworks, no tracking pixels, and no third-party SDKs that collect data.

Network activity

The only network activity in Vox Dictum is: (1) downloading AI models on first use (~2 GB, one-time), (2) downloading additional transcription models when you select a new model size, and (3) Apple verifying your subscription status. No recording data, transcript content, or user-generated content is transmitted over the network at any time.

Speaker recognition

Speaker voice matching (Pro+ tier) uses on-device neural network inference to compare voice characteristics within a single processing session. Voice embeddings are computed transiently in memory and discarded when the processing job completes. No biometric data is stored persistently. No voice profiles are created or retained between sessions.

Subscriptions

Subscriptions are managed entirely by Apple through the App Store. Cobalt InFX does not process payments, store credit card information, or have access to your Apple ID credentials.

Data storage

Your transcripts, speaker names, vocabulary corrections, and summaries are stored locally in your Mac's Application Support directory within the app's sandbox. This data is included in Time Machine backups and is deleted when you uninstall the app. Cobalt InFX has no access to this data.

Children's privacy

Cobalt InFX products are not directed at children under 13. We do not knowingly collect any information from children.

Changes to this policy

If we update this policy, we will post the revised version on this page with an updated effective date.

Effective date: April 2026

Contact: [email protected]

Support

Need help?

Before contacting us

Check the built-in Help section in Vox Dictum: Settings → Help. Most common questions are answered there.

Common topics

Model downloads require a stable internet connection (~2–7 GB depending on features used). Transcription with advanced models may take longer than the recording duration — this is normal for on-device processing.

Vox Dictum system requirements

macOS 14.6 (Sonoma) or later

Chip Apple Silicon (M1 or later)

RAM 8 GB minimum, 16 GB recommended

Storage ~4–7 GB for AI models (downloaded on first use)

Send us a message

Click the button below to open your email client. Describe your issue and attach screenshots if relevant.

Please include:

› What you did

› What you expected

› What happened

› Screenshots (attach as .jpg or .png)

Email Support

[email protected]