Speech Swift — on-device speech processing for Apple Silicon (ASR, TTS, diarization, speech-to-speech)

Hi all,

Sharing a Swift package I've been working on — a modular speech processing toolkit that runs entirely on-device using
MLX Swift and CoreML.

The package provides 8 protocols and 11 model implementations covering the full speech pipeline:

  • SpeechRecognitionModel — Qwen3-ASR, Parakeet TDT (CoreML)
  • SpeechGenerationModel — Qwen3-TTS, CosyVoice TTS (streaming)
  • SpeechToSpeechModel — PersonaPlex 7B (full-duplex)
  • VoiceActivityDetectionModel — Silero (streaming), Pyannote (overlap)
  • SpeakerEmbeddingModel — WeSpeaker ResNet34
  • SpeakerDiarizationModel — Pyannote + WeSpeaker + spectral clustering
  • SpeechEnhancementModel — DeepFilterNet3 (CoreML Neural Engine)
  • ForcedAlignmentModel — Qwen3-ForcedAligner (word-level timestamps)

Each model target is independent — import Qwen3ASR doesn't pull in TTS or anything else. Models download from
HuggingFace on first use, cached locally.

A few design decisions I'd appreciate feedback on:

  1. MLX vs CoreML split — Large models (ASR, TTS, PersonaPlex) run on MLX/GPU. Small models (Silero VAD, DeepFilterNet3)
    run on CoreML/Neural Engine. This avoids ANE contention when running multiple models. Does this pattern resonate with
    others doing on-device ML?
  2. Protocol design — All protocols use AnyObject constraint (reference semantics for large weight buffers) and optional
    language: String?. No ModelLoadable protocol since each model has different loading parameters. See Protocols.swift.
  3. Composed pipelines — Currently StreamingASR (VAD → ASR) and DiarizationPipeline exist as Layer 2 classes. Working on
    MeetingTranscriber (diarize → per-segment ASR) next. What other compositions would be useful?

Roadmap: Roadmap: v0.1 → v0.3 · ivan-digital/qwen3-asr-swift · Discussion #81 · GitHub
Repo: GitHub - ivan-digital/qwen3-asr-swift: AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML · GitHub

Requirements: Swift 5.9+, macOS 14+ / iOS 17+, Apple Silicon.

5 Likes