How much does Gemini 3.1 Flash-Lite cost?

Gemini 3.1 Flash-Lite costs $0.25 per million input tokens and $1.50 per million output tokens, making it Google's most cost-effective model for high-volume use.

Is Gemini 3.1 Flash-Lite faster than previous versions?

Yes, Gemini 3.1 Flash-Lite is 2.5x faster in Time to First Token (TTFT) and has a 45% higher throughput (363 tps) than earlier Gemini versions.

Does Gemini 3.1 Flash-Lite support a 1M context window?

Yes, it maintains Google's industry-leading 1 million token context window, supporting deep analysis of long documents, video, and audio.

What is the price for cached tokens on Gemini 3.1 Flash-Lite?

Input cache hits for Gemini 3.1 Flash-Lite are priced at just $0.025 per million tokens, offering a 90% discount for repeated prefixes.

Does Gemini 3.1 Flash-Lite support multimodal inputs?

Gemini 3.1 Flash-Lite natively supports text, images, video (up to 45 mins), and audio (up to 8.4 hours) in a single API prompt.

Gemini 3.1 Flash-Lite

Google's Aggressive Move to Own Cheap AI APIs

May 17, 2026, 16:06 Eastern Daylight Time by

Sk Jabedul Haque

Google Gemini 3.1 Flash-Lite is the most cost-efficient AI model in its category, priced at $0.25 per million input tokens and $1.50 per million output tokens. With a 1M token context window and 2.5x faster response times compared to previous versions, it is designed for high-volume tasks like real-time translation, data extraction, and high-frequency agentic workflows.

What You'll Learn

✓ Detailed pricing breakdown of Gemini 3.1 Flash-Lite vs. competitors
✓ Speed benchmarks: Why 363 tokens per second matters for 2026
✓ Multimodal capabilities: Native support for 45 minutes of video
✓ Decision matrix for budget-conscious enterprise AI integration

The global AI price war intensified on May 7, 2026, when Google Cloud announced the release of Gemini 3.1 Flash-Lite. While competitors like OpenAI and Anthropic have focused on increasing reasoning depth, Google has executed a flanking maneuver by targeting the "efficiency frontier." By slashing input token prices to just $0.25 per million, Google is effectively undercutting the GPT-5.5 vs Grok 4.3 pricing tiers and challenging the ultra-cheap DeepSeek V4 Pro market.

For startups and enterprises scaling agentic workflows, the Gemini 3.1 Flash-Lite review results are clear: it is the new benchmark for speed. Benchmarks from Artificial Analysis show a 2.5x improvement in Time to First Token (TTFT) and a massive 363 tokens per second output speed, making it the ideal candidate for real-time customer support agents and large-scale data processing pipelines.

Current Status & Latest Data

Google’s strategy with the 3.1 generation is centered on "Intelligence at Scale." Gemini 3.1 Flash-Lite maintains the industry-leading 1 million token context window, allowing it to ingest up to 3,000 images or 45 minutes of video in a single prompt. This native multimodality is a significant advantage over ZAYA1-8B and other open-source alternatives that often require separate vision or audio encoders.

The pricing model is strictly pay-as-you-go via Vertex AI and Google AI Studio. Beyond the $0.25 input price, Google has introduced "Aggressive Caching" where input cache hits cost only $0.025 per million tokens—a 90% discount for repeated prefixes. This makes it highly effective for RAG (Retrieval Augmented Generation) applications where users query the same large knowledge base multiple times.

Key Factors Driving the Market

The "Attack" part of this launch is its direct pressure on the low-margin AI market. By providing a 2.5x speed increase over previous versions, Google is solving the latency problem that has plagued large-context models. In tasks like Embeddings API comparison, Flash-Lite acts as a perfect "pre-processor," filtering and structuring data before it is passed to more expensive reasoning models like Claude Mythos.

Related Article
Cloudflare Workers AI: Free Inference 2026

Expert Analysis & Insights

Analytical tests show that while Gemini 3.1 Flash-Lite is optimized for speed, it does not sacrifice core reasoning. It scores 82.2 on GPQA Diamond, which is highly competitive for a model in the "Lite" category. The trade-off is primarily in very complex math and code repair where Pro models still lead. However, for 95% of common business automation tasks, the 2.5x speed boost provides more value than absolute reasoning depth. The following table highlights the "Price-to-Performance" ratio:

Metric	Gemini 3.1 Flash-Lite	DeepSeek V4 Flash	GPT-5 mini
Input Price / 1M	$0.25	$0.14	$0.25
Output Price / 1M	$1.50	$0.28	$2.00
Output Speed	363 tps	~150 tps	~220 tps
Context Window	1M	1M	500K

Future Outlook

The "8:1 price gap" between Google's Pro and Flash tiers indicates that we are moving toward a multi-model architecture standard. Developers are no longer picking one model; they are using ROI measurement tools for AI agents to route simple queries to Flash-Lite and reserve Pro models for high-value reasoning. By late 2026, expect Flash-Lite to become the default "worker bee" of the internet, handling millions of background tasks from translation to email sorting.

Conclusion

Gemini 3.1 Flash-Lite is more than just a pricing update; it is a declaration of war on the "high-cost AI" status quo. By providing near-instant responses at a 95% lower cost than frontier models, Google has made high-volume AI accessible to every developer. Key Takeaways:

Switch to 3.1 Flash-Lite if latency and input token cost are your primary bottlenecks.
Leverage the 1M context window for video and image analysis at a fraction of the previous cost.
Use Google's aggressive caching ($0.025) to build persistent, context-aware AI assistants.

For more on high-velocity feature delivery, see our study on Rakuten’s 79% faster development.

Related Article
Perplexity Spaces Not Syncing Across Devices

Last Updated: May 18, 2026 | Source: Google Cloud & Vertex AI Documentation

Frequently Asked Questions

in Technology

# AI