Model runs entirely on your device.
Kernels are the low-level GPU programs that do the model's actual math — the matrix multiplications, attention, and normalization behind every token. And how well they're optimized can dramatically speed up inference.