Images and vision models

Ekorbia supports image attachments through any vision-capable model. Drop a .png, .jpg, .jpeg, or .webp into a chat and a model that can see it will see it.

What counts as a vision model

Ekorbia detects vision support per model; models that can see images get a small VISION badge in the model picker.

On the bundled engine, every model in the built-in catalog is vision-capable — just download any Gemma 4 from Settings → Models. (On the Ollama backend, pull one like gemma4:e4b or llava.)

Attaching an image

Two ways:

Paperclip in the composer — choose any image file
Screenshot capture — press the hotkey, drag a region, and a new chat opens with the screenshot already attached

The image appears as a chip above the composer, just like other attachments. If the active model can see it, a VISION badge appears on the chip.

At least one available — Ekorbia automatically switches to it and shows a toast: “Switched to vision model: gemma3:4b”. The original model is restored if you remove the image.
None available — A toast warns you the image will be ignored. You can still attach it (no error), but the model can’t see it. Download a vision model to fix the situation — any Gemma 4 in Settings → Models.

This auto-switch only happens for image attachments — text attachments never trigger a model swap.

What images do NOT do

They are not chunked or embedded. Vision works directly on pixels, not on a text representation.
They are not included in Citations and sources — citation markers refer to text chunks only. The image’s filename appears in the Sources footer as a non-citation chip so you can see what visual context was sent.
They are not stored permanently as part of the chat unless the image came from your filesystem to begin with. Screenshot captures live in your temp directory and may be cleaned up by the OS on reboot.

Attaching files
Screenshot capture
Choose a model — picking a vision-capable model

Ekorbia User Guide

Images and vision models

What counts as a vision model

Attaching an image

Mixed vision/text model behavior

When the active model is vision-capable

When the active model is NOT vision-capable

What images do NOT do

Keyboard shortcuts

Ekorbia User Guide