Gemma 4 Locally: Every Capability Tested on M4 Pro MacBook

The Gemma 4 Family

Model	Type	Modalities	4-bit RAM	24GB Mac?
E2B	Dense (2.3B eff.)	Text, Image, Audio	4 GB	95 tok/s
E4B	Dense (4.5B eff.)	Text, Image, Audio	5.5 GB	57 tok/s
26B-A4B	MoE (4B active)	Text, Image	16-18 GB	~2 tok/s
31B	Dense (31B)	Text, Image	17-20 GB	Won't fit

Speed Benchmarks

Tested with identical coding/translation prompts, 512 max tokens, Gemma 4 default parameters (temp=1.0, top_p=0.95, top_k=64).

Audio ASR: 3 Languages

Tested via Ollama's OpenAI-compatible endpoint (/v1/chat/completions with input_audio). E2B and E4B only — 26B/31B don't support audio.

🇺🇸

English ASR

Ground truth: "Hello, I am an artificial intelligence model. Today we will test speech recognition in English. Technology is evolving rapidly and language models are becoming more capable every day."

E4B 1.0s

"Hello. I am an artificial intelligence model. Today we will test speech recognition in English. Technology is evolving rapidly and language models are becoming more capable every day."

Perfect transcription.

E2B 2.8s

"Hello today's speech recognition in English language models technology is evolving rapidly language models are becoming more capable every day"

Garbled — missing words, no punctuation.

🇫🇷

French ASR

Ground truth: "Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité."

E4B 1.6s

"Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité."

Perfect transcription with accents.

E2B 4.1s

"ils doivent raison et de conscience et droits humains libres et d'esprit de fraternité"

Fragmented, missing most of the text.

🇸🇦

Arabic ASR

Ground truth: "مرحباً، أنا نموذج ذكاء اصطناعي. اليوم سنختبر التعرف على الكلام باللغة العربية."

E4B 6.0s

"مرحبًا، أنا نموذج ذكاء اصطناعي. اليوم سنختبر التعرف على الكلام باللغة العربية. التكنولوجيا تتطور بسرعة والنماذج اللغوية أصبحت أكثر كفاءة."

Perfect Arabic transcription.

E2B 6.0s

"اكثر ركفاء نماذج لغويه اصبحت عبري علم الكلام..."

Garbled — wrong words, disordered.

🌐

Speech Translation (E4B)

French → English

"All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience, and must act toward one another in a spirit of fraternity."

Arabic → English

"Hello, I am an artificial intelligence model. Today we will test speech recognition in the Arabic language. Technology is developing quickly, and language models have become more efficient."

Image Understanding

Tested via Ollama /api/chat endpoint. All 4 models support vision.

Test 1: Landmark Identification

Prompt: "What country and city? Name the landmark."

E4B 54.4 tok/s

"Country: Thailand. City: Bangkok. Landmark: Wat Phra Kaew (Temple of the Emerald Buddha) within the Grand Palace complex."

E2B 88.1 tok/s

"Country: Thailand. City: Bangkok. Landmark: Grand Palace complex."

Correct but less specific — didn't name Wat Phra Kaew.

Test 2: AI-Generated Image + Japanese OCR

AI-Generated with nano-banana / Gemini

Prompt: "Describe. Read any Japanese text. Is this AI-generated or real?"

E4B

"A photograph of a bustling, wet street scene in a commercial district in Japan, likely at dusk or night... Correctly identified: Shinjuku Ramen Dori (新宿ラーメン通り)"

E2B

"City: Tokyo, Shinjuku. Japanese text: 新宿ラーメン通り (Shinjuku Ramen Street), ひかりラーメン (Hikari Ramen)"

Both models correctly read Japanese kanji from an AI-generated image.

Test 3: Detailed Captioning

Prompt: "Detailed caption. Identify the city and any visible text."

E4B

"A magnificent seagull perches watchfully atop a sculpted pedestal, dominating the foreground. The backdrop is a rich study in contrasting architectural styles: to the right stands an immense, richly detailed classical facade..."

Full-Stack App Generation

Each model was asked to generate a complete React + Tailwind CSS Task Manager as a single HTML file.

E4B: Task Manager WORKS

52.1 tok/s · 2,073 tokens · 40.7s

React CDN

Tailwind CSS

useState Hooks

Add/Delete/Toggle

Hover Effects

155 lines

E2B: FAILED

Generated code fragments instead of a single HTML file. The 2B model couldn't follow the "single file" constraint.

Step 1: Initial Load

Step 2: Typing a Task

Step 3: Task Added

Step 4: Task Completed

Agentic Multi-Step Reasoning

6-step task: design a blog platform with DB schema, SQLAlchemy models, FastAPI endpoints, React frontend, Dockerfile, and deployment checklist.

E2B

6/6 steps · 7 code blocks · Python, TSX, Dockerfile

9,258 chars · 78 tok/s · 43s

E4B

6/6 steps · 5 code blocks · Python, TSX, Dockerfile

14,562 chars · 49 tok/s · 85s

Coding: Compile & Run

Generated Python scripts and executed them. Both pass all 4 tests.

Script	E2B	E4B
Fibonacci (first 20)	PASS	PASS
Sieve of Eratosthenes	PASS	PASS
JSON nested processor	PASS	PASS
HTTP request (urllib)	PASS	PASS
React+Tailwind single file	FAIL	PASS

Why 26B Fails on 24GB

From community testing (r/LocalLLaMA): Gemma 4 has a KV cache memory problem.

31B at full 262K context: ~22GB just for KV cache (on top of model)
People with 64GB RAM + 24GB VRAM are getting OOM errors
Google did NOT adopt KV-reducing techniques (Lightning Attention, Mamba-2) that Qwen 3.5 uses
llama.cpp sliding window implementation may still have bugs

Workaround: --ctx-size 8192 --cache-type-k q4_0 --parallel 1