On-device · WebGPU · agentic kernel optimization

Gemma 4 in your browser.
Kernels written by Fable 5.

Gemma 4 E2B (QAT Mobile) — a powerful open-source model — runs fully on-device with WebGPU. Weights cache locally after the first load, and nothing you type ever leaves your machine.

2.3BEffective params

128KContext window

~250tok/s · M4 Max

100%On-device

Model card

WebGPU kernels 100% written & optimized by Fable 5 → Tuned for Apple M4 Max · experimental

Chat below

Kernels

What are Kernels?

Kernels are the low-level GPU programs that do the model's actual math — the matrix multiplications, attention, and normalization behind every token. And how well they're optimized can dramatically speed up inference.

WebGPU & WGSL. Each kernel is a WebGPU compute shader, written in WGSL — the language that runs general-purpose math on the GPU — entirely locally in your browser.
Agentic Kernel Optimization. Every kernel was generated by AI (in this case, Fable 5, before it was shut down), benchmarked on an Apple M4 Max, and refined through an evolutionary, genetic-style search toward the fastest version.
Blazingly Fast. This means we are able to run Gemma 4 E2B at ~250 tokens/sec on an M4 Max, pushing your device to its limits.

Select a kernel to read its real source

Gemma 4 in your browser.Kernels written by Fable 5.

What's on your mind today?

Gemma 4 in your browser.
Kernels written by Fable 5.