Documentation

Kalam is open-source voice typing for your computer; this page covers installation, API keys, and every feature. Setup, API keys, and building from source, plus a full user manual. Everything in one place.

Setup & reference

Download

Get the latest release for Windows, macOS, or Linux from GitHub:

github.com/afaraha8403/kalam/releases/latest

Download the installer or archive for your platform and run it. No account required.

Quick Start

Install and open Kalam.
For cloud speech-to-text: add your API key in Settings → STT Provider (see API Keys). For fully offline use, choose a local model—no key needed.
Press Ctrl+Win (Windows) or Ctrl+Cmd (macOS) from any app to start dictating.

API Keys (BYOK)

Kalam is open source and free. It uses a Bring Your Own Key (BYOK) model for cloud transcription:

Sign up at Groq Console and create a free API key.
In Kalam, go to Settings → STT Provider and paste your key.
Cloud audio is sent over TLS and not retained by Groq.

Tip

Local mode runs entirely on your machine (SenseVoice / Whisper.cpp)—no API key and no data leaves your device.

Building from Source

Prerequisites:

Node.js 20+
Rust 1.75+

git clone https://github.com/afaraha8403/kalam.git
cd kalam

# Install dependencies (PowerShell on Windows)
./tasks.ps1 deps

# Run in development mode
./tasks.ps1 dev

# Build for production
./tasks.ps1 build

See ./tasks.ps1 help for all commands, including release and signing.

Privacy

Audio is not stored to disk (in-memory only).
Cloud mode: audio sent to Groq over TLS 1.3; zero retention.
Local mode: everything stays on your device.
History is stored locally in SQLite with AES-256 encryption.
Telemetry is opt-in only (off by default).

Contributing

Contributions are welcome.

Open an issue or pull request at GitHub.

User manual

Overview

This section explains every feature: what it does, how it works, and important nuances. Jump to a topic below or use the sidebar.

Getting started
Dictation
STT & languages
Formatting
Snippets
History
Dictionary
Command mode
Main window
Hotkeys
Overlay
Privacy & data
Notifications
Troubleshooting

Getting started

First launch

When you open Kalam for the first time, an onboarding flow guides you through:

Welcome — intro
Account — email (optional)
Permissions — microphone and accessibility
Mode — cloud vs local
Controls — hotkeys and languages
Ready — you can complete each step or skip with defaults

Skip onboarding

If you skip, Kalam uses the default audio device, marks onboarding as complete, and you can configure everything later in Settings.

System permissions

Microphone — required for dictation.
Accessibility — required so Kalam can inject text into other apps. Without it, transcription works but text will not be typed into the focused app.

Kalam can open the relevant system permission pages (Settings → Permissions or during onboarding). On macOS the system may prompt for accessibility on first use.

Windows: ms-settings:easeofaccess-keyboard
macOS: System Preferences → Privacy → Accessibility

Opening the app

Kalam does not show a main window on startup; it runs in the system tray.

Left-click the tray icon → open main window
Right-click (tray menu) → Settings, History, Snippets, Check for Updates, Quit

Dictation: how it works

Hold-to-talk (default)

Hold the dictation hotkey (Ctrl+Win on Windows, Ctrl+Super on Linux, Ctrl+Cmd on macOS), speak, then release to stop. The app records while the key is held, transcribes, and injects text into the app that had focus when you pressed the key.

Short press

If you release the hotkey before Minimum hold time (Settings → General; default 300 ms), recording does not start. The overlay may show "Hold to talk." This avoids accidental dictation.

Toggle mode

With a Toggle dictation hotkey (Settings → Audio & Dictation), press once to start and again to stop—no need to hold the key.

Focus

The target window is the one that had focus when you started recording (captured at key-down). Do not switch windows mid-dictation if you want text in a specific field.

Note

Overlay — A small on-screen pill shows state: Hidden, Collapsed, Listening, Recording (with level; command mode uses a different color), Processing, Error, or Status. Position and expand direction are in Settings.

Speech-to-text and languages

Modes

Cloud — remote API (Groq or OpenAI)
Local — entirely on your machine (SenseVoice or Whisper)
Hybrid / Auto — app chooses cloud vs local (e.g. by network or sensitivity)

Set the mode in Settings → Audio & Dictation.

Provider (cloud)

Groq is the default; OpenAI is also supported. Enter your API key in Settings → Audio & Dictation (STT Provider). BYOK: your key is only sent to the provider for transcription and is not stored by Kalam beyond your machine.

Local models

SenseVoice and Whisper (base) can be downloaded, started, stopped, restarted, or deleted from Settings. Each model has hardware and disk requirements.

Settings shows status: NotInstalled, Stopped, Starting, Running, Error (and any error message)
Large models require significant disk space

Languages

The recognition language list is ordered; the first is the default.
Add a second (or more) and set a Language toggle hotkey to swap (e.g. English ↔ Spanish). A notification confirms the switch.
Provider support: SenseVoice — English, Chinese, Japanese, Korean, Cantonese. Groq/Whisper — many more (e.g. English, Spanish, French, German, Chinese, Japanese, and others in the app).

VAD preset

Voice Activity Detection: Fast (quick, may cut off), Balanced, or Accurate (more conservative). Set in Settings → Audio & Dictation.

Text injection and formatting

Injection method

In Settings → Formatting (or Advanced):

Auto (default) — keystrokes for short text, clipboard paste for text longer than Clipboard threshold (default 50 characters)
Keystrokes — character-by-character (slower for long text)
Clipboard — always paste, then restore your previous clipboard

Retries

If injection fails, Kalam can retry. Configure retry attempts and delay in Settings → Advanced.

Voice commands

When "Enable voice commands" is on (Settings → Formatting), saying these words is replaced or triggers an action:

Punctuation: "period" / "full stop" → . ; "comma" → , ; "question mark" → ? ; "exclamation mark" → ! ; "colon", "semicolon", "dash", "hyphen" → : ; -
Line breaks: "new line" → newline; "new paragraph" → double newline; "tab" → tab
Actions: "undo" → Ctrl+Z (Cmd+Z on macOS); "Delete that" → backspace last chunk; "Scratch that" → backspace to last sentence/chunk. "Delete that" and "scratch that" only work after a previous injection in the same session.

Other formatting options

Filler word removal — removes "um", "uh", "like", "you know", "I mean", "basically", "actually" (Settings → Formatting)
Auto punctuation — adds a period if the text looks like a sentence (capital, no ending punctuation)
Custom rules — regex pattern + replacement (Settings → Formatting or Advanced); applied after other formatting
Snippets — trigger phrases (e.g. @@email) expand to longer text; longer triggers applied first. See Snippets

Snippets

Snippets are short triggers that expand to longer text when they appear in the transcribed output.

Add, edit, remove: Main window → Snippets or tray → Snippets
Each snippet has a trigger and an expansion
Triggers are matched before injection — use distinct triggers to avoid accidental expansion in normal speech

History

Kalam stores past transcriptions locally: text, timestamp, mode, language, duration. Data is in SQLite; see Privacy and data for encryption and retention.

Open: Main window sidebar or tray menu
Search and browse; retention is configurable in Settings → Privacy (history retention days)

Dictionary

The dictionary holds custom terms (e.g. names, jargon) to improve recognition accuracy.

Entries are stored locally
Add or delete in Settings → Dictionary
Future versions may pass the dictionary to the cloud provider or local model

Command mode (notes, tasks, reminders)

Command mode uses a separate hotkey from dictation. When you press it and speak, Kalam does not inject text into another app; it creates a note, task, or reminder.

Say: "new note …", "new task …", or "new reminder …" with your content
With LLM parsing (Settings → Command Mode), Kalam can use a provider (Groq, OpenRouter, Gemini, OpenAI, Anthropic) to classify the phrase and extract fields (title, content, due date, etc.)
Set the command hotkey and, for LLM, provider and API key (and optionally model)

Note

Same hold vs short-press rule: releasing before minimum hold time cancels the command. Command hotkey cannot equal dictation or language-toggle. Created items appear in the main window under Notes, Tasks, or Reminders.

Main window and sidebar

The main window has a sidebar with:

Home — aggregate stats (streak, total words, time saved, last latency), recent history, tasks/reminders due today
Snippets, History, Notes, Tasks, Reminders
Settings (and About inside Settings)

Dictation On/Off in the sidebar toggles dictation and hotkeys globally. Turn it off to pause without closing the app.

Settings tabs

General — hotkeys, min hold, auto-start, overlay position
Audio & Dictation — device, STT mode, provider, API key, languages, VAD
Dictionary, Command Mode, Privacy
Advanced — injection, logging, app data
About — version, Check for Updates (installing requires restart)

Hotkeys reference

Dictation (hold): default Ctrl+Win (Windows), Ctrl+Super (Linux), Ctrl+Cmd (macOS). Configurable in Settings.
Toggle dictation: optional — press once to start, again to stop. Settings → Audio & Dictation.
Language toggle: optional — swaps the first two languages. Cannot be the same as dictation or command hotkey.
Command mode: optional — when Command mode is enabled. Must differ from dictation and toggle dictation hotkeys.

Tip

Kalam validates hotkeys on save; conflicts show an error. While capturing a new hotkey in Settings, global hotkeys are paused so the key you press is only recorded.

Overlay and appearance

States: Hidden, Collapsed, Listening, ShortPress, Recording (with level; command mode uses a different color), Processing, Error, Status.

Position: Bottom Center (default), corners, sides, or center
Fine-tune: offset X/Y and expand direction (Up, Down, Center) in Settings
Waveform style: Line, Symmetric, Heartbeat, Snake, DoubleHelix, Liquid, Waves, Glitch, Bars, CenterSplit — visual only

Privacy and data

Audio — not stored to disk; in-memory only. In cloud mode, sent to provider over TLS; we do not retain it.
History — stored locally in SQLite; retention configurable; may be encrypted (e.g. AES-256).
Sensitive app detection (Settings → Privacy) — define patterns (process name, window title, or bundle ID); when the focused app matches, Kalam can force local mode, block dictation, or require confirmation.
Telemetry — opt-in, off by default.

Notifications and logging

Notifications (Settings) — completion, errors, updates; optional sound
Logging (Settings → Advanced) — in-app log with level and max records; export logs or open app data folder for support

Troubleshooting

Text goes to the wrong app — Focus is captured when you press the hotkey. Do not switch windows after pressing it.
Nothing happens when I press the key — Check: Dictation is On (sidebar); you held the key long enough (above Minimum hold time); you are not capturing a hotkey in Settings (hotkeys paused); microphone and accessibility permissions are granted.
"Language toggle hotkey cannot be the same as…" — Choose a different key for the language toggle in Settings.
Command hotkey same as dictation — Change one so they differ.
Injection fails — Try Injection method "Clipboard" or lower the clipboard threshold; ensure accessibility permission; increase retry attempts/delay in Advanced.
Local model won't start — Check hardware and disk space; in Settings see model status (NotInstalled, Stopped, Starting, Running, Error) and error message.
Microphone not detected — Grant microphone permission; Settings → Audio & Dictation → select device and use "Test microphone".

Quick checks

Most "nothing happens" issues are: Dictation Off, short-press (below min hold), or missing microphone/accessibility permission. Verify these first.