mutterkey
KDE-first push-to-talk transcription tool for KDE Plasma
Loading...
Searching...
No Matches
transcriptiontypes.h
Go to the documentation of this file.
1#pragma once
2
3#include <QMetaType>
4#include <QString>
5#include <QStringList>
6
7#include <cstdint>
8#include <vector>
9
18enum class RuntimeErrorCode : std::uint8_t {
19 None,
20 Cancelled,
21 InvalidConfig,
22 ModelNotFound,
23 InvalidModelPackage,
24 UnsupportedModelPackageVersion,
25 ModelIntegrityFailed,
26 IncompatibleModelPackage,
27 ModelTooLarge,
28 ModelLoadFailed,
29 AudioNormalizationFailed,
30 UnsupportedLanguage,
31 DecodeFailed,
32 InternalRuntimeError,
33};
34
40 RuntimeErrorCode code = RuntimeErrorCode::None;
42 QString message;
44 QString detail;
45
50 [[nodiscard]] bool isOk() const { return code == RuntimeErrorCode::None; }
51};
52
58 QString backendName;
60 QStringList supportedLanguages;
64 bool supportsTranslation = false;
66 bool supportsWarmup = false;
67};
68
82
132
138 std::vector<float> samples;
140 int sampleRate = 16000;
142 int channels = 1;
143
148 [[nodiscard]] bool isValid() const { return !samples.empty(); }
149};
150
156 std::vector<float> samples;
158 int sampleRate = 16000;
160 int channels = 1;
162 std::int64_t streamOffsetFrames = 0;
163
168 [[nodiscard]] bool isValid() const { return !samples.empty(); }
169};
170
174enum class TranscriptEventKind : std::uint8_t {
175 Partial,
176 Final,
177};
178
184 TranscriptEventKind kind = TranscriptEventKind::Partial;
186 QString text;
188 std::int64_t startMs = -1;
190 std::int64_t endMs = -1;
191};
192
198 std::vector<TranscriptEvent> events;
201
206 [[nodiscard]] bool isOk() const { return error.isOk(); }
207};
208
214 bool success = false;
216 QString text;
219};
220
221Q_DECLARE_METATYPE(RuntimeErrorCode)
222Q_DECLARE_METATYPE(RuntimeError)
223Q_DECLARE_METATYPE(BackendCapabilities)
224Q_DECLARE_METATYPE(ModelMetadata)
One normalized streaming audio unit passed into a transcription session.
int sampleRate
Sample rate of the chunk payload.
std::vector< float > samples
Mono float32 samples for this chunk.
int channels
Channel count of the chunk payload.
std::int64_t streamOffsetFrames
Start frame offset of this chunk within the utterance stream.
bool isValid() const
Reports whether the chunk contains usable audio samples.
Product-owned backend/runtime metadata surfaced to app code.
bool supportsAutoLanguage
true when the backend can auto-detect the spoken language.
QStringList supportedLanguages
Supported language codes accepted by this backend.
bool supportsTranslation
true when the backend supports translation mode.
QString backendName
Stable backend identifier used in diagnostics.
bool supportsWarmup
true when warmup is a supported preflight operation.
Product-owned immutable metadata about a validated model artifact.
bool legacyCompatibility
Raw-path compatibility marker for migration diagnostics.
QString quantization
Quantization metadata when known.
QString sourceFormat
Source format imported or packaged by Mutterkey.
int textContext
Text context size when known.
QString packageId
Stable product-owned package identifier.
QString displayName
Human-readable package/model name.
int formatType
Backend-specific format type value when known.
int textLayerCount
Text layer count when known.
QString languageProfile
Language profile such as en or multilingual.
int melCount
Mel filter count when known.
int textState
Text state size when known.
QString packageVersion
Optional package version string.
QString tokenizer
Tokenizer metadata when known.
int audioHeadCount
Audio attention head count when known.
int textHeadCount
Text attention head count when known.
int audioLayerCount
Audio layer count when known.
int audioState
Audio state size when known.
QString runtimeFamily
Runtime family this artifact belongs to.
int vocabularySize
Vocabulary size when known.
int audioContext
Audio context size when known.
QString modelFormat
Backend-facing model format marker such as ggml.
QString architecture
Model family or architecture string when known.
Normalized runtime audio payload.
std::vector< float > samples
Mono float32 samples ready for runtime ingestion.
bool isValid() const
Reports whether the normalized payload contains any samples.
int sampleRate
Sample rate of the normalized audio. Kept at 16 kHz.
int channels
Channel count of the normalized audio. Kept at one channel.
Runtime inspection data kept separate from static backend capabilities.
QString runtimeDescription
Human-readable runtime and device summary.
QString selectionReason
Human-readable explanation for why this runtime was selected.
QString loadedModelDescription
Loaded-model description when a model is available.
QString backendName
Stable backend identifier used in diagnostics.
Structured runtime-layer failure with user-facing and diagnostic text.
bool isOk() const
Reports whether this value represents success.
QString detail
Optional extra context for diagnostics.
QString message
Human-readable summary safe to surface in logs or UI.
RuntimeErrorCode code
Stable error category for programmatic handling and tests.
One transcript event produced by a backend session.
std::int64_t startMs
Optional inclusive event start timestamp in milliseconds.
TranscriptEventKind kind
Whether this event is partial or final.
std::int64_t endMs
Optional exclusive event end timestamp in milliseconds.
QString text
Transcript text payload for this event.
Result of one streaming session operation.
bool isOk() const
Reports whether this update completed without a runtime error.
RuntimeError error
Structured runtime failure when the operation did not succeed.
std::vector< TranscriptEvent > events
Zero or more transcript events emitted by the operation.
Result of a single transcription attempt.
RuntimeError error
Structured runtime failure when success is false.
bool success
true when transcription completed successfully.
QString text
Final recognized text when success is true.
TranscriptEventKind
Stable transcript event categories emitted by streaming sessions.
RuntimeErrorCode
Stable categories for runtime-layer failures.