mutterkey
KDE-first push-to-talk transcription tool for KDE Plasma
Loading...
Searching...
No Matches
Mutterkey Documentation

mutterkey_logo

Mutterkey is a native C++ + Qt 6 push-to-talk transcription tool for KDE Plasma.

This documentation is generated from the repo-owned C++ headers under src/. It focuses on the application's ownership boundaries, runtime contracts, and daemon-oriented workflow rather than end-user setup.

Current behavior:

  • registers a global shortcut through KGlobalAccel
  • records microphone audio while the shortcut is held
  • transcribes locally through the configured runtime path
  • copies the resulting text to the clipboard
  • expects you to paste the text yourself with Ctrl+V

Current runtime shape:

  • TranscriptionEngine is the immutable runtime/provider boundary
  • TranscriptionSession is the mutable per-session decode boundary
  • native Mutterkey model packages are now the canonical model artifact
  • internal audio flow is streaming-first through normalized chunks and transcript events
  • BackendCapabilities reports static backend support used for orchestration
  • RuntimeDiagnostics reports runtime/device/model inspection data separately from static capabilities, including runtime-selection reasoning
  • RuntimeError and RuntimeErrorCode provide typed runtime failures
  • ModelCatalog, ModelPackage, and ModelValidator own model inspection, compatibility checks, and integrity validation before backend load
  • raw Whisper .bin files are handled only through an explicit compatibility path and import flow
  • RuntimeSelector owns runtime-selection policy instead of burying that logic in the generic factory
  • CpuReferenceModelHandle and related native model helpers own the current product-owned CPU reference model loading boundary
  • TranscriptionWorker hosts transcription on a dedicated QThread and creates live sessions lazily on that worker thread
  • the shipped daemon and once flows still use a compatibility wrapper that assembles a final transcript from the streaming runtime path
  • config parsing under src/config.* stays product-shaped and permissive, while backend-specific support checks live in the runtime layer

Core API surface covered here:

  • HotkeyManager registers the global push-to-talk shortcut through KDE.
  • AudioRecorder captures microphone audio while the shortcut is held.
  • RecordingNormalizer converts captured audio to runtime-ready mono float32 samples at 16 kHz.
  • AudioChunker splits normalized audio into deterministic stream chunks.
  • TranscriptAssembler builds final transcript text from streaming events.
  • ModelCatalog resolves package directories, model.json, and legacy raw artifacts into validated product-owned model metadata.
  • RawWhisperImporter converts raw whisper.cpp-compatible ggml .bin files into native Mutterkey packages.
  • RuntimeSelector decides which runtime implementation should handle a given configured model path and records the reason in diagnostics.
  • TranscriptionEngine and TranscriptionSession define the app-owned runtime seam.
  • CpuReferenceTranscriber provides the current product-owned native CPU reference runtime scaffold.
  • WhisperCppTranscriber performs in-process transcription through vendored whisper.cpp.
  • ClipboardWriter copies the resulting text to the clipboard.
  • MutterkeyService coordinates those pieces on the main thread plus a dedicated transcription worker thread.

Current product direction:

  • KDE-first
  • local-only transcription
  • CLI/service-first operation
  • tray-shell work exists, but the daemon remains the product core
  • a product-owned native CPU reference runtime now exists for ownership, packaging, and conformance work
  • whisper.cpp is still the only real end-user speech decoder today
  • the vendored runtime is now optional at build time through MUTTERKEY_ENABLE_LEGACY_WHISPER=OFF

For build, runtime, release, and service setup use the repository README.md and RELEASE_CHECKLIST.md.