Speech Swift — on-device speech processing for Apple Silicon (ASR, TTS, diarization, speech-to-speech)

ivan-digital · March 6, 2026, 7:48am

Hi all,

Sharing a Swift package I've been working on — a modular speech processing toolkit that runs entirely on-device using
MLX Swift and CoreML.

The package provides 8 protocols and 11 model implementations covering the full speech pipeline:

SpeechRecognitionModel — Qwen3-ASR, Parakeet TDT (CoreML)
SpeechGenerationModel — Qwen3-TTS, CosyVoice TTS (streaming)
SpeechToSpeechModel — PersonaPlex 7B (full-duplex)
VoiceActivityDetectionModel — Silero (streaming), Pyannote (overlap)
SpeakerEmbeddingModel — WeSpeaker ResNet34
SpeakerDiarizationModel — Pyannote + WeSpeaker + spectral clustering
SpeechEnhancementModel — DeepFilterNet3 (CoreML Neural Engine)
ForcedAlignmentModel — Qwen3-ForcedAligner (word-level timestamps)

Each model target is independent — import Qwen3ASR doesn't pull in TTS or anything else. Models download from
HuggingFace on first use, cached locally.

A few design decisions I'd appreciate feedback on:

MLX vs CoreML split — Large models (ASR, TTS, PersonaPlex) run on MLX/GPU. Small models (Silero VAD, DeepFilterNet3)
run on CoreML/Neural Engine. This avoids ANE contention when running multiple models. Does this pattern resonate with
others doing on-device ML?
Protocol design — All protocols use AnyObject constraint (reference semantics for large weight buffers) and optional
language: String?. No ModelLoadable protocol since each model has different loading parameters. See Protocols.swift.
Composed pipelines — Currently StreamingASR (VAD → ASR) and DiarizationPipeline exist as Layer 2 classes. Working on
MeetingTranscriber (diarize → per-segment ASR) next. What other compositions would be useful?

Roadmap: Roadmap: v0.1 → v0.3 · ivan-digital/qwen3-asr-swift · Discussion #81 · GitHub
Repo: GitHub - ivan-digital/qwen3-asr-swift: AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML · GitHub

Requirements: Swift 5.9+, macOS 14+ / iOS 17+, Apple Silicon.