|
mutterkey
KDE-first push-to-talk transcription tool for KDE Plasma
|
Mutterkey is a native C++ + Qt 6 push-to-talk transcription tool for KDE Plasma.
This documentation is generated from the repo-owned C++ headers under src/. It focuses on the application's ownership boundaries, runtime contracts, and daemon-oriented workflow rather than end-user setup.
Current behavior:
KGlobalAccelCtrl+VCurrent runtime shape:
TranscriptionEngine is the immutable runtime/provider boundaryTranscriptionSession is the mutable per-session decode boundaryBackendCapabilities reports static backend support used for orchestrationRuntimeDiagnostics reports runtime/device/model inspection data separately from static capabilities, including runtime-selection reasoningRuntimeError and RuntimeErrorCode provide typed runtime failuresModelCatalog, ModelPackage, and ModelValidator own model inspection, compatibility checks, and integrity validation before backend load.bin files are handled only through an explicit compatibility path and import flowRuntimeSelector owns runtime-selection policy instead of burying that logic in the generic factoryCpuReferenceModelHandle and related native model helpers own the current product-owned CPU reference model loading boundaryTranscriptionWorker hosts transcription on a dedicated QThread and creates live sessions lazily on that worker threadonce flows still use a compatibility wrapper that assembles a final transcript from the streaming runtime pathsrc/config.* stays product-shaped and permissive, while backend-specific support checks live in the runtime layerCore API surface covered here:
HotkeyManager registers the global push-to-talk shortcut through KDE.AudioRecorder captures microphone audio while the shortcut is held.RecordingNormalizer converts captured audio to runtime-ready mono float32 samples at 16 kHz.AudioChunker splits normalized audio into deterministic stream chunks.TranscriptAssembler builds final transcript text from streaming events.ModelCatalog resolves package directories, model.json, and legacy raw artifacts into validated product-owned model metadata.RawWhisperImporter converts raw whisper.cpp-compatible ggml .bin files into native Mutterkey packages.RuntimeSelector decides which runtime implementation should handle a given configured model path and records the reason in diagnostics.TranscriptionEngine and TranscriptionSession define the app-owned runtime seam.CpuReferenceTranscriber provides the current product-owned native CPU reference runtime scaffold.WhisperCppTranscriber performs in-process transcription through vendored whisper.cpp.ClipboardWriter copies the resulting text to the clipboard.MutterkeyService coordinates those pieces on the main thread plus a dedicated transcription worker thread.Current product direction:
whisper.cpp is still the only real end-user speech decoder todayMUTTERKEY_ENABLE_LEGACY_WHISPER=OFFFor build, runtime, release, and service setup use the repository README.md and RELEASE_CHECKLIST.md.