How many parameters does ZAYA1-8B have?

ZAYA1-8B has a total of 8.4 billion parameters, but only 760 million parameters are active per token during inference, making it extremely fast and efficient.

What makes ZAYA1-8B unique compared to other LLMs?

ZAYA1-8B is the first high-performance mixture-of-experts model trained end-to-end on AMD Instinct MI300X clusters, proving that frontier AI can be developed without relying on NVIDIA hardware.

Is ZAYA1-8B open source?

Yes, ZAYA1-8B is released under the Apache 2.0 license, which allows for both research and unrestricted commercial use.

Where can I download ZAYA1-8B?

You can download the model weights directly from Hugging Face (Zyphra/ZAYA1-8B) or use the free serverless API endpoint available on Zyphra Cloud.

How does ZAYA1-8B perform on benchmarks?

ZAYA1-8B punches far above its weight, scoring 89.1 on AIME'26 and 65.8 on LiveCodeBench-v6, outperforming models with 10x more active parameters.

ZAYA1-8B

The First AMD-Trained AI Model Challenging NVIDIA's Dominance

Sk Jabedul Haque

May 17, 2026 • 5 min read • 175 views

Navigation

10 Sections

Get Updates on WhatsApp

ZAYA1-8B is the first open-source reasoning model trained entirely on AMD Instinct MI300X clusters. With 8.4 billion total parameters and only 760 million active during inference, it matches or exceeds models many times its size on benchmarks like AIME'26 (89.1) and LiveCodeBench-v6 (65.8). Released under the Apache 2.0 license, it is available for free as a serverless endpoint on Zyphra Cloud.

What You'll Learn

✓ The technical specifications of ZAYA1-8B’s MoE++ architecture
✓ Why training on AMD Instinct MI300X challenges NVIDIA’s dominance
✓ Benchmarks against Llama, Mistral, and GPT-5 Pro models
✓ How to access ZAYA1-8B via Zyphra Cloud and Hugging Face

The AI infrastructure landscape shifted significantly on May 6, 2026, with Zyphra’s release of ZAYA1-8B. While the market has been dominated by NVIDIA’s CUDA-native models, ZAYA1-8B represents a breakthrough as the first high-performance mixture-of-experts (MoE) reasoning model trained end-to-end on an AMD-native stack. This launch proves that frontier intelligence can finally escape "NVIDIA’s CUDA gravity" and achieve massive scale on alternative hardware.

For enterprises and researchers, ZAYA1-8B isn't just about parameter counts; it’s about "Intelligence Density." Despite having a small active footprint, the model punches far above its weight class, competing directly with much larger systems like DeepSeek V4 and Llama 4 Scout. By leveraging the Apache 2.0 license, Zyphra has democratized access to high-tier reasoning, allowing for both research and commercial integration without the restrictive licensing fees seen in proprietary models like GPT-5.5 vs Grok 4.3.

Scheme Overview & Objectives

ZAYA1-8B is built on Zyphra's proprietary MoE++ architecture, which optimizes the routing of tokens to specialized "experts." This allows the model to maintain a total parameter count of 8.4 billion while only activating 760 million parameters per token. The primary objective was to create a model that offers "Maximum Intelligence Density per Parameter," making it ideal for low-latency, high-accuracy reasoning on edge devices and cost-efficient cloud clusters.

The model was trained on a massive cluster of 1,024 AMD Instinct MI300X nodes using AMD Pensando Pollara networking on IBM Cloud infrastructure. This full-stack AMD approach ensures that the model is natively compatible with ROCm and avoids the performance bottlenecks often found in software translations from CUDA-optimized codebases.

Eligibility Criteria

Because ZAYA1-8B is released under the Apache 2.0 license, the eligibility for its use is virtually unlimited. Developers can download the weights directly from Hugging Face for local deployment or access the free serverless endpoint on Zyphra Cloud. It is compatible with standard inference engines, although a "ZAYA-aware" runtime is recommended to fully leverage its unique CCA attention state and compressed KV-cache (8x compression relative to full MHA).

Related Article
Mac M4 Max Local LLM 70B Benchmark

Key Benefits & Performance

The primary benefit of ZAYA1-8B is its efficiency. On the AIME'26 mathematics benchmark, it scored 89.1, surpassing models with 10x more active parameters. Similarly, its score of 65.8 on LiveCodeBench-v6 makes it a top-tier assistant for software engineering tasks. This performance is largely attributed to its 16-expert routing system and the inclusion of Markovian RSA (Reasoning Search Augmentation), which allows the model to "think" through multiple traces in parallel for complex queries.

Benchmark / Feature	ZAYA1-8B	Ministral 3 8B
Active Parameters	760 Million	~8 Billion
AIME'26 Score	89.1	~70.0
Pricing / 1M tokens	$0.00 (Serverless)	$0.15
Training Hardware	AMD MI300X	NVIDIA H100
License	Apache 2.0	Proprietary

How It Works

The ZAYA1-8B model works by using a "router" that directs each input token to the single most relevant expert among its 16 available FFN blocks. This "Top-1" routing, combined with the lack of a residual expert, minimizes the compute required for each pass while preserving the depth of the 8.4B parameter knowledge base. It also utilizes a **Gemma 3 tokenizer** with a vocabulary size of 262,272, ensuring high fidelity for multi-lingual and technical content. For developers looking to optimize their Vector Database workflows, ZAYA1-8B’s efficiency makes it an excellent choice for local embedding generation and initial retrieval steps.

Important Dates & Access

ZAYA1-8B was officially released on **May 6, 2026**. Following its successful training demonstration in late 2025, the weights and technical report were published simultaneously. Developers can currently access the model weights on Hugging Face (Zyphra/ZAYA1-8B) or test the live response times via the Zyphra Cloud dashboard. Much like the Cloudflare Workers AI initiative, Zyphra’s serverless offering provides a zero-cost entry point for high-volume token usage.

Conclusion

ZAYA1-8B is more than just a model; it is a proof of concept for the future of AI hardware diversity. By showing that AMD Instinct MI300X clusters can produce world-class reasoning models, Zyphra has opened the door for a more competitive and cost-effective GPU market. Key Takeaways:

ZAYA1-8B is the most intelligent model in its weight class, with 760M active parameters.
Full-stack AMD training eliminates the "NVIDIA premium" and promotes ecosystem diversity.
The Apache 2.0 license and free serverless endpoint make it the best value for open-source reasoning in 2026.

For further insights into the evolving AI landscape, read our analysis on Claude Code feature delivery.

Related Article
Claude 4 Computer Use for Non-Developers

Last Updated: May 18, 2026 | Source: Zyphra PR & Technical Report (arXiv:2605.05365)

Frequently Asked Questions

Sk Jabedul Haque

Founder & Chief Editor

Building India's most trusted finance education platform — simplifying news, calculators, and market trends so anyone can understand and invest confidently.

Read full bio →

in Technology