Local AI Distillation on WebGPU vs Cloud: When to Use Each
Distillation is the step that turns a raw conversation into useful memory. MindLock can run it two ways: locally on your device using WebLLM and your GPU, or in the cloud using Gemini on the Pro plan. Both produce the same output shape. The question is which one to use when.
This post is a practical guide — what each mode is, what it costs in time and privacy, and how to decide.
What Distillation Does
A conversation is a long sequence of messages. A memory document is a compact, structured summary of what is actually worth keeping. Distillation reads the conversation and writes:
- Profile memory — durable facts about you and your work.
- Topic memories — focused documents grouped by theme.
The output is markdown you can read, edit, search, and feed back into any AI as context. Full context on this: Memory Documents.
Local Distillation via WebLLM
Local mode runs an LLM inside your browser tab using WebGPU. The model weights are downloaded once, cached, and executed on your GPU.
Three models are offered:
| Tier | Use when |
|---|---|
| Fast | Low-end GPU or you want quick turnaround and are okay with shorter summaries. |
| Balanced | Default choice for a modern laptop. Good quality, reasonable speed. |
| Quality | Desktop GPU with plenty of VRAM. Slowest, best summaries. |
What you get:
- Privacy: the conversation never leaves your device. Not even to MindLock's servers.
- Offline: works with no network after the first model download.
- Cost: free.
What you pay:
- Speed: a local model is slower than a hosted model, especially on laptops.
- First-run cost: the model download is multi-GB. Plan for it once.
- Hardware floor: WebGPU-capable browser and a GPU with enough memory for the tier you pick.
Model selection and loading lives in Settings.
Cloud Distillation via Gemini
Cloud mode (Pro) sends the conversation to Gemini 3.0 Flash for distillation. Pro includes 100 operations per month.
What you get:
- Speed: dramatically faster than local for long conversations.
- Quality ceiling: a frontier hosted model beats what you can run in-browser.
- No hardware floor: works on any device, including phones.
What you pay:
- $5/month for the Pro plan.
- Cloud transit: the conversation is sent to Gemini for processing. If the conversation itself is sensitive, this matters.
- Quota: 100 operations/month. Heavy users track usage.
How to Decide
A simple heuristic:
| Situation | Use |
|---|---|
| Sensitive conversation (client, legal, medical, strategy) | Local. The content never leaves your device. |
| Long conversation, tight deadline | Cloud. The speed difference is real. |
| Low-power laptop, casual work | Cloud if you're on Pro; otherwise Local Fast. |
| Offline or on a plane | Local. Nothing else works. |
| You want the best summary quality | Cloud for most cases, Local Quality if the content must stay local. |
| First time trying MindLock | Local Balanced. See what it does for free before paying. |
You are not locked in. Pro users can still run local distillation any time. Free users always run local. The two modes are a menu, not a commitment.
A Mixed Workflow That Works Well
Many users end up on a hybrid:
- Sensitive conversations → Local distillation.
- Bulk, long, non-sensitive conversations → Cloud distillation to save time.
- Everything ends up in the same memory store and the same semantic search.
You pay only for the speed you actually need, on the conversations where the tradeoff makes sense.
Embeddings and Search Are Always Local
Worth calling out: the semantic search index — what powers Ctrl+K across all your content — runs on-device regardless of which distillation mode you pick. Your search queries don't leave your machine.
Start
If you are new, open the Dashboard, load a local model in Settings, and run a distillation on a real conversation. You will know within one run whether local is fast enough for how you work. If it isn't, Free vs Pro lays out the cloud option honestly.
Related reading: Generating Context.