What is Google Gemma 4?

Gemma 4 is a family of four open-weight AI models released by Google DeepMind on April 2, 2026. The models range from 2.3B to 31B parameters and are available under the Apache 2.0 open-source license for unrestricted commercial use.

Can Gemma 4 run on my own hardware?

Yes. The smallest model (E2B, 2.3B parameters) runs on smartphones, Raspberry Pi devices, and basic laptops. The mid-range 26B MoE model runs on a single consumer GPU. The flagship 31B model requires dual consumer GPUs or a server-grade card.

How does Gemma 4 compare to Gemma 3?

The jump is dramatic. Gemma 4 E2B with 2.3B effective parameters outperforms Gemma 3 27B (a model 10x larger) on most benchmarks. On AIME 2026 math tests, the 31B model scores 89.2% versus Gemma 3 27B's 20.8%.

Is Gemma 4 free for commercial use?

Yes. Gemma 4 is released under the Apache 2.0 license, which allows unrestricted commercial use, modification, and distribution. This is a significant change from previous Gemma versions which had more restrictive licensing.

All blogs

Google Gemma 4: The Most Capable Open AI Model You Can Run Yourself

Google DeepMind released Gemma 4 on April 2, four open-source models under Apache 2.0 that range from Raspberry Pi to datacenter scale. The 2.3B model beats its 27B predecessor. Here is what matters for developers and businesses.

Jordan3 Apr 2026 · 10 min read

Google Gemma 4: The Most Capable Open AI Model You Can Run Yourself

Introduction

A 2.3 billion parameter model that outperforms its 27 billion parameter predecessor. That is the headline number from Google Gemma 4, released on April 2, 2026. But the real story is not a single benchmark. It is that Google just open-sourced a family of four AI models, from Raspberry Pi scale to datacenter scale, under Apache 2.0. No restrictions on commercial use. No special agreements. Download and deploy.

The Gemma 4 family comes from the same research and technology stack as Gemini 3, Google's flagship closed model. That makes this the closest Google has come to giving away its best work. For businesses exploring local AI deployment, self-hosted inference, or agentic workflows that need to run on-premise, this release changes the math.

Four Models, From Phone to Server Rack

Gemma 4 ships as four distinct models, each targeting a different hardware profile. E2B has 2.3 billion effective parameters, supports 128K token context, and handles text, images, and audio. It runs on smartphones, IoT devices, and Raspberry Pis. E4B doubles the parameters to 4.5 billion with the same 128K context and multimodal support, targeting edge devices and laptops.

The 26B model uses a Mixture-of-Experts (MoE) architecture with only 3.8 billion active parameters at any given time, despite its 26 billion total. This gives it the intelligence of a much larger model at the inference cost of a small one. It supports 256K token context. The flagship 31B dense model packs 30.7 billion parameters with 256K context and ranks third among all open models on the Arena AI leaderboard with a score of 2150 on LMArena.

All four models handle text, images, video, and audio inputs natively. All support function-calling for agentic workflows out of the box. And all are released under Apache 2.0, which means you can modify them, fine-tune them, and ship them in commercial products without licensing fees or usage restrictions.

The Benchmarks That Matter

Numbers without context are noise. So here are the comparisons that tell an actual story. On GPQA Diamond, a graduate-level science reasoning benchmark, the 31B model scores 85.7% and the 26B model scores 79.2%. On AIME 2026 math, the 31B scores 89.2% and the 26B hits 88.3%. Compare that to Gemma 3 27B at 20.8% on the same test. The generational improvement is not incremental. It is a category shift.

Tool use tells a similar story. On the retail benchmark from the tau-2 suite, the 31B model scores 86.4%. Gemma 3 27B scored 6.6% on the same test. This matters because tool use is the core capability for agentic AI: an AI model that can call functions, query APIs, and chain actions together to solve multi-step problems.

The E2B model deserves its own highlight. At 2.3 billion effective parameters, it beats Gemma 3 27B on most benchmarks despite being roughly one-tenth the size. Google CEO Sundar Pichai described it as packing "an incredible amount of intelligence per parameter." In multilingual performance, the models outperform Qwen 3.5 in German, Arabic, Vietnamese, and French, relevant for businesses operating across Europe and beyond.

What the Community Found After 24 Hours

No launch is complete without real-world testing. Within 24 hours of release, the developer community identified both strengths and limitations. The E2B model's efficiency received widespread praise. Running a capable multimodal model on a basic laptop or Raspberry Pi was previously not feasible. Now it is, and the practical use cases for edge deployment expand significantly.

The concerns centered on the MoE model. Community benchmarks showed it running at roughly 11 tokens per second versus 60+ for Qwen 3.5's equivalent model. That speed gap matters for interactive applications. The dense 31B model clocked 18 to 25 tokens per second on dual consumer GPUs, acceptable for most use cases but below the faster closed alternatives.

VRAM consumption was also flagged as higher than expected, particularly for long context windows. And developers attempting to fine-tune the models with QLoRA reported tooling friction with Google's new training configuration requirements. These are launch-day issues that tend to improve rapidly, but they are worth noting for teams planning immediate deployments.

Why Apache 2.0 Changes Everything

Previous Gemma versions shipped under a more restrictive license that limited certain commercial applications. Gemma 4 ships under Apache 2.0, the same license used by Kubernetes, Airflow, and most of the modern open-source infrastructure stack.

The practical impact is immediate. You can download Gemma 4, fine-tune it on your proprietary data, embed it in your product, and sell that product without paying Google or signing an agreement. You can modify the model weights, create derivative works, and distribute them. The only requirement is attribution.

For businesses that have been wary of closed AI APIs because of vendor lock-in, data privacy, or unpredictable pricing, this is the strongest alternative yet. Run it on your own servers. Keep your data on-premise. Pay for compute, not per-token API fees. The total cost of ownership math for many workloads tips in favor of self-hosted when the model quality reaches this level.

What We See at MG Software

At MG Software, we currently use a mix of cloud API models for different tasks. Gemma 4 does not replace that strategy, but it adds a powerful new option for specific scenarios.

The E2B model is interesting for on-device features in mobile and progressive web apps. Classification, intent detection, and simple summarization tasks that currently require an API call could run locally, eliminating latency and API costs entirely. For progressive web apps that need offline AI capabilities, this was previously not realistic.

The 26B MoE model hits a sweet spot for businesses that want self-hosted AI but cannot justify datacenter-grade hardware. A single consumer GPU running a 256K context window model with function-calling support opens the door to local code assistants, document analysis, and customer-facing chat that never leaves your infrastructure. For clients with strict data residency requirements, especially in healthcare, legal, and government sectors, this is the answer to the question "can we use AI without sending our data to a third party?"

If your team is evaluating whether local or self-hosted AI makes sense for your use case, get in touch. The cost and capability threshold shifted this week.

Conclusion

Google Gemma 4 is not just another open model release. It is the point where open-source AI reaches genuine production quality across multiple scales, from edge devices to server deployments, with no licensing strings attached. The benchmarks speak for themselves. A 2.3B model outperforming last generation's 27B model is the kind of efficiency gain that reshapes what is possible.

For development teams, the takeaway is practical: test Gemma 4 against your current workloads. For classification, function-calling, and multilingual tasks, it may already be good enough to replace API calls. For self-hosted deployment, the Apache 2.0 license removes the last barrier. The open-source AI gap is closing faster than most people expected.

Share this post

Jordan

Co-Founder

Claude Code Source Leak: What 512,000 Lines of TypeScript Reveal About AI Coding Agents

On March 31, Anthropic accidentally published the complete Claude Code source code via npm. From self-healing memory to undercover mode, here is what 1,906 leaked files reveal about how modern AI coding agents work under the hood.

Jordan1 Apr 2026 · 15 min read

AI & automation

How AI Tools Created New Security Attack Surfaces: From Vercel to Claude Code

Vercel was breached through a compromised AI tool. Claude Code had RCE vulnerabilities. AI agents can steal GitHub credentials via prompt injection. Here is what changed in 2026 and how to protect your team.

Sidney21 Apr 2026 · 13 min read

AI & automation

Vibe Coding: When AI-Generated Software Is Not Enough (and When It Is)

Vibe coding tools like Cursor, Bolt.new, and Lovable let anyone build software with AI. But 45% of AI-generated code has security flaws and founders burn thousands rebuilding what AI built wrong. Here is where the line is.

Jordan12 Apr 2026 · 14 min read

AI & automation

AI Agents in Your Business: 5 Workflows You Can Automate This Month

AI agents are no longer experimental. Here are five concrete business workflows that you can automate with AI agents today, with implementation details and expected results from our client projects.

Sidney8 Apr 2026 · 11 min read

All blogs

Google Gemma 4: The Most Capable Open AI Model You Can Run Yourself

Jordan3 Apr 2026 · 10 min read

Introduction

Four Models, From Phone to Server Rack

The Benchmarks That Matter

What the Community Found After 24 Hours

Why Apache 2.0 Changes Everything

What We See at MG Software

At MG Software, we currently use a mix of cloud API models for different tasks. Gemma 4 does not replace that strategy, but it adds a powerful new option for specific scenarios.

If your team is evaluating whether local or self-hosted AI makes sense for your use case, get in touch. The cost and capability threshold shifted this week.

Conclusion

Share this post

Jordan

Co-Founder

Google Gemma 4: The Most Capable Open AI Model You Can Run Yourself

Introduction

Four Models, From Phone to Server Rack

The Benchmarks That Matter

What the Community Found After 24 Hours

Why Apache 2.0 Changes Everything

What We See at MG Software

Conclusion

More on this topic

Related posts

Claude Code Source Leak: What 512,000 Lines of TypeScript Reveal About AI Coding Agents

How AI Tools Created New Security Attack Surfaces: From Vercel to Claude Code

Vibe Coding: When AI-Generated Software Is Not Enough (and When It Is)

AI Agents in Your Business: 5 Workflows You Can Automate This Month

Want to leverage AI in your project?

Google Gemma 4: The Most Capable Open AI Model You Can Run Yourself

Introduction

Four Models, From Phone to Server Rack

The Benchmarks That Matter

What the Community Found After 24 Hours

Why Apache 2.0 Changes Everything

What We See at MG Software

Conclusion

More on this topic

Related posts

Claude Code Source Leak: What 512,000 Lines of TypeScript Reveal About AI Coding Agents

How AI Tools Created New Security Attack Surfaces: From Vercel to Claude Code

Vibe Coding: When AI-Generated Software Is Not Enough (and When It Is)

AI Agents in Your Business: 5 Workflows You Can Automate This Month

Want to leverage AI in your project?