Gemma 4 Audio ASR Tests

Play each audio sample and compare E2B vs E4B transcription results.

English ASR

TTS-generated test speech
Ground truth: "Hello, I am an artificial intelligence model. Today we will test speech recognition in English. Technology is evolving rapidly and language models are becoming more capable every day."
E4B (1.0s): Perfect transcription. Every word correct.
E2B (2.8s): Garbled -- missing words, no punctuation.

French ASR

Universal Declaration of Human Rights, Article 1
Ground truth: "Tous les etres humains naissent libres et egaux en dignite et en droits..."
E4B (1.6s): Perfect transcription with all French accents.
E2B (4.1s): Fragmented, missing most of the sentence.

Arabic ASR

TTS-generated Arabic test speech
E4B (6.0s): Perfect Arabic transcription.
E2B (6.0s): Garbled -- wrong words, disordered.

Obama Farewell Speech (via HuggingFace Transformers)

Real speech -- tested with E2B via transformers pipeline
E2B transcription: "This week I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was an opportunity to say thank you..."

Part of Gemma 4 Benchmark Suite - Tested April 3, 2026 on MacBook Pro M4 Pro 24GB