Documentation
Kalam is open-source voice typing for your computer; this page covers installation, API keys, and every feature. Setup, API keys, and building from source, plus a full user manual. Everything in one place.
Setup & reference
Download
Get the latest release for Windows, macOS, or Linux from GitHub:
Download the installer or archive for your platform and run it. No account required.
Quick Start
- Install and open Kalam.
- For cloud speech-to-text: add your API key in Settings → STT Provider (see API Keys). For fully offline use, choose a local model—no key needed.
- Press
Ctrl+Win(Windows) orCtrl+Cmd(macOS) from any app to start dictating.
API Keys (BYOK)
Kalam is open source and free. It uses a Bring Your Own Key (BYOK) model for cloud transcription:
- Sign up at Groq Console and create a free API key.
- In Kalam, go to Settings → STT Provider and paste your key.
- Cloud audio is sent over TLS and not retained by Groq.
Tip
Local mode runs entirely on your machine (SenseVoice / Whisper.cpp)—no API key and no data leaves your device.
Building from Source
Prerequisites:
git clone https://github.com/afaraha8403/kalam.git
cd kalam
# Install dependencies (PowerShell on Windows)
./tasks.ps1 deps
# Run in development mode
./tasks.ps1 dev
# Build for production
./tasks.ps1 build
See ./tasks.ps1 help for all commands, including release and signing.
Privacy
- Audio is not stored to disk (in-memory only).
- Cloud mode: audio sent to Groq over TLS 1.3; zero retention.
- Local mode: everything stays on your device.
- History is stored locally in SQLite with AES-256 encryption.
- Telemetry is opt-in only (off by default).
Contributing
Contributions are welcome.
- Open an issue or pull request at GitHub.
User manual
Overview
This section explains every feature: what it does, how it works, and important nuances. Jump to a topic below or use the sidebar.
- Getting started
- Dictation
- STT & languages
- Formatting
- Snippets
- History
- Dictionary
- Command mode
- Main window
- Hotkeys
- Overlay
- Privacy & data
- Notifications
- Troubleshooting
Getting started
First launch
When you open Kalam for the first time, an onboarding flow guides you through:
- Welcome — intro
- Account — email (optional)
- Permissions — microphone and accessibility
- Mode — cloud vs local
- Controls — hotkeys and languages
- Ready — you can complete each step or skip with defaults
Skip onboarding
If you skip, Kalam uses the default audio device, marks onboarding as complete, and you can configure everything later in Settings.
System permissions
- Microphone — required for dictation.
- Accessibility — required so Kalam can inject text into other apps. Without it, transcription works but text will not be typed into the focused app.
Kalam can open the relevant system permission pages (Settings → Permissions or during onboarding). On macOS the system may prompt for accessibility on first use.
- Windows:
ms-settings:easeofaccess-keyboard - macOS: System Preferences → Privacy → Accessibility
Opening the app
Kalam does not show a main window on startup; it runs in the system tray.
- Left-click the tray icon → open main window
- Right-click (tray menu) → Settings, History, Snippets, Check for Updates, Quit
Dictation: how it works
Hold-to-talk (default)
Hold the dictation hotkey (Ctrl+Win on Windows, Ctrl+Super on Linux, Ctrl+Cmd on macOS), speak, then release to stop. The app records while the key is held, transcribes, and injects text into the app that had focus when you pressed the key.
Short press
If you release the hotkey before Minimum hold time (Settings → General; default 300 ms), recording does not start. The overlay may show "Hold to talk." This avoids accidental dictation.
Toggle mode
With a Toggle dictation hotkey (Settings → Audio & Dictation), press once to start and again to stop—no need to hold the key.
Focus
The target window is the one that had focus when you started recording (captured at key-down). Do not switch windows mid-dictation if you want text in a specific field.
Note
Overlay — A small on-screen pill shows state: Hidden, Collapsed, Listening, Recording (with level; command mode uses a different color), Processing, Error, or Status. Position and expand direction are in Settings.
Speech-to-text and languages
Modes
- Cloud — remote API (Groq or OpenAI)
- Local — entirely on your machine (SenseVoice or Whisper)
- Hybrid / Auto — app chooses cloud vs local (e.g. by network or sensitivity)
Set the mode in Settings → Audio & Dictation.
Provider (cloud)
Groq is the default; OpenAI is also supported. Enter your API key in Settings → Audio & Dictation (STT Provider). BYOK: your key is only sent to the provider for transcription and is not stored by Kalam beyond your machine.
Local models
SenseVoice and Whisper (base) can be downloaded, started, stopped, restarted, or deleted from Settings. Each model has hardware and disk requirements.
- Settings shows status: NotInstalled, Stopped, Starting, Running, Error (and any error message)
- Large models require significant disk space
Languages
- The recognition language list is ordered; the first is the default.
- Add a second (or more) and set a Language toggle hotkey to swap (e.g. English ↔ Spanish). A notification confirms the switch.
- Provider support: SenseVoice — English, Chinese, Japanese, Korean, Cantonese. Groq/Whisper — many more (e.g. English, Spanish, French, German, Chinese, Japanese, and others in the app).
VAD preset
Voice Activity Detection: Fast (quick, may cut off), Balanced, or Accurate (more conservative). Set in Settings → Audio & Dictation.
Text injection and formatting
Injection method
In Settings → Formatting (or Advanced):
- Auto (default) — keystrokes for short text, clipboard paste for text longer than Clipboard threshold (default 50 characters)
- Keystrokes — character-by-character (slower for long text)
- Clipboard — always paste, then restore your previous clipboard
Retries
If injection fails, Kalam can retry. Configure retry attempts and delay in Settings → Advanced.
Voice commands
When "Enable voice commands" is on (Settings → Formatting), saying these words is replaced or triggers an action:
- Punctuation: "period" / "full stop" → . ; "comma" → , ; "question mark" → ? ; "exclamation mark" → ! ; "colon", "semicolon", "dash", "hyphen" → : ; -
- Line breaks: "new line" → newline; "new paragraph" → double newline; "tab" → tab
- Actions: "undo" → Ctrl+Z (Cmd+Z on macOS); "Delete that" → backspace last chunk; "Scratch that" → backspace to last sentence/chunk. "Delete that" and "scratch that" only work after a previous injection in the same session.
Other formatting options
- Filler word removal — removes "um", "uh", "like", "you know", "I mean", "basically", "actually" (Settings → Formatting)
- Auto punctuation — adds a period if the text looks like a sentence (capital, no ending punctuation)
- Custom rules — regex pattern + replacement (Settings → Formatting or Advanced); applied after other formatting
- Snippets — trigger phrases (e.g.
@@email) expand to longer text; longer triggers applied first. See Snippets
Snippets
Snippets are short triggers that expand to longer text when they appear in the transcribed output.
- Add, edit, remove: Main window → Snippets or tray → Snippets
- Each snippet has a trigger and an expansion
- Triggers are matched before injection — use distinct triggers to avoid accidental expansion in normal speech
History
Kalam stores past transcriptions locally: text, timestamp, mode, language, duration. Data is in SQLite; see Privacy and data for encryption and retention.
- Open: Main window sidebar or tray menu
- Search and browse; retention is configurable in Settings → Privacy (history retention days)
Dictionary
The dictionary holds custom terms (e.g. names, jargon) to improve recognition accuracy.
- Entries are stored locally
- Add or delete in Settings → Dictionary
- Future versions may pass the dictionary to the cloud provider or local model
Command mode (notes, tasks, reminders)
Command mode uses a separate hotkey from dictation. When you press it and speak, Kalam does not inject text into another app; it creates a note, task, or reminder.
- Say: "new note …", "new task …", or "new reminder …" with your content
- With LLM parsing (Settings → Command Mode), Kalam can use a provider (Groq, OpenRouter, Gemini, OpenAI, Anthropic) to classify the phrase and extract fields (title, content, due date, etc.)
- Set the command hotkey and, for LLM, provider and API key (and optionally model)
Note
Same hold vs short-press rule: releasing before minimum hold time cancels the command. Command hotkey cannot equal dictation or language-toggle. Created items appear in the main window under Notes, Tasks, or Reminders.
Main window and sidebar
The main window has a sidebar with:
- Home — aggregate stats (streak, total words, time saved, last latency), recent history, tasks/reminders due today
- Snippets, History, Notes, Tasks, Reminders
- Settings (and About inside Settings)
Dictation On/Off in the sidebar toggles dictation and hotkeys globally. Turn it off to pause without closing the app.
Settings tabs
- General — hotkeys, min hold, auto-start, overlay position
- Audio & Dictation — device, STT mode, provider, API key, languages, VAD
- Dictionary, Command Mode, Privacy
- Advanced — injection, logging, app data
- About — version, Check for Updates (installing requires restart)
Hotkeys reference
- Dictation (hold): default
Ctrl+Win(Windows),Ctrl+Super(Linux),Ctrl+Cmd(macOS). Configurable in Settings. - Toggle dictation: optional — press once to start, again to stop. Settings → Audio & Dictation.
- Language toggle: optional — swaps the first two languages. Cannot be the same as dictation or command hotkey.
- Command mode: optional — when Command mode is enabled. Must differ from dictation and toggle dictation hotkeys.
Tip
Kalam validates hotkeys on save; conflicts show an error. While capturing a new hotkey in Settings, global hotkeys are paused so the key you press is only recorded.
Overlay and appearance
States: Hidden, Collapsed, Listening, ShortPress, Recording (with level; command mode uses a different color), Processing, Error, Status.
- Position: Bottom Center (default), corners, sides, or center
- Fine-tune: offset X/Y and expand direction (Up, Down, Center) in Settings
- Waveform style: Line, Symmetric, Heartbeat, Snake, DoubleHelix, Liquid, Waves, Glitch, Bars, CenterSplit — visual only
Privacy and data
- Audio — not stored to disk; in-memory only. In cloud mode, sent to provider over TLS; we do not retain it.
- History — stored locally in SQLite; retention configurable; may be encrypted (e.g. AES-256).
- Sensitive app detection (Settings → Privacy) — define patterns (process name, window title, or bundle ID); when the focused app matches, Kalam can force local mode, block dictation, or require confirmation.
- Telemetry — opt-in, off by default.
Notifications and logging
- Notifications (Settings) — completion, errors, updates; optional sound
- Logging (Settings → Advanced) — in-app log with level and max records; export logs or open app data folder for support
Troubleshooting
- Text goes to the wrong app — Focus is captured when you press the hotkey. Do not switch windows after pressing it.
- Nothing happens when I press the key — Check: Dictation is On (sidebar); you held the key long enough (above Minimum hold time); you are not capturing a hotkey in Settings (hotkeys paused); microphone and accessibility permissions are granted.
- "Language toggle hotkey cannot be the same as…" — Choose a different key for the language toggle in Settings.
- Command hotkey same as dictation — Change one so they differ.
- Injection fails — Try Injection method "Clipboard" or lower the clipboard threshold; ensure accessibility permission; increase retry attempts/delay in Advanced.
- Local model won't start — Check hardware and disk space; in Settings see model status (NotInstalled, Stopped, Starting, Running, Error) and error message.
- Microphone not detected — Grant microphone permission; Settings → Audio & Dictation → select device and use "Test microphone".
Quick checks
Most "nothing happens" issues are: Dictation Off, short-press (below min hold), or missing microphone/accessibility permission. Verify these first.