Skip to Content

PolarQuant + QJL: The Two-Stage Secret Behind TurboQuant's Zero Loss

How PolarQuant's random rotation + polar transform and QJL's 1-bit error correction work together to achieve what single-stage quantization cannot
Apr 28, 2026, 21:53 Eastern Daylight Time by
PolarQuant + QJL: The Two-Stage Secret Behind TurboQuant's Zero Loss

TurboQuant achieves zero accuracy loss through two complementary algorithms: PolarQuant (random rotation + polar transform) and QJL (1-bit residual correction). Together, they compress KV cache 6x with no quality degradation.

We've covered TurboQuant's results — 6x memory reduction, 8x faster attention, zero accuracy loss. But how does it actually work? The secret lies in two algorithms that TurboQuant combines: PolarQuant and QJL (Quantized Johnson-Lindenstrauss).

Understanding these two stages explains why TurboQuant achieves what traditional quantization cannot — lossless compression at aggressive bit-widths.

Why Two Stages? The Quantization Problem

Traditional quantization faces a fundamental trade-off: aggressive compression loses accuracy, conservative compression doesn't save enough memory. At 3-bit, most methods lose 5-10% accuracy because they treat each dimension independently.

TurboQuant solves this with a two-stage approach:

  • Stage 1 — PolarQuant: Rotate and transform the vector so it's easy to compress
  • Stage 2 — QJL: Correct any remaining errors with 1-bit precision

This combination — simple transform, tiny correction — achieves what single-stage methods cannot.

Stage 1: PolarQuant — Making Vectors Easy to Quantize

PolarQuant (to be presented at AISTATS 2026) uses two tricks: random rotation and polar transform.

Trick 1: Random Rotation

High-dimensional vectors (like KV cache entries) often have axes with very different scales — some directions have huge values, others tiny. Standard quantization treats all axes equally, wasting bits on meaningless variations.

PolarQuant applies a random rotation to the vector. In high dimensions, random projections "spread out" the energy — no single coordinate dominates. This creates a vector where all dimensions have similar scales.

Trick 2: Polar Transform

After rotation, PolarQuant applies a polar transform — converting the vector into magnitude and direction. Imagine converting rectangular XYZ coordinates to spherical coordinates (radius + angles).

Why does this help? The magnitude (radius) tends to concentrate around a small range, while the angular coordinates have predictable distributions. Both are easier to quantize than the original raw values.

Stage What It Does Result
Input Raw KV cache vector d dimensions, FP16
Step 1 Random rotation Energy spread equally
Step 2 Polar transform Easy-to-quantize format
Quantize 3-bit encoding ~5x compression

After PolarQuant's two steps, the vector is much easier to compress. But some small errors remain — this is where QJL comes in.

Stage 2: QJL — Cleaning Up the Residual

QJL (Quantized Johnson-Lindenstrauss) was presented at AISTATS 2026. It tackles the error that PolarQuant leaves behind.

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss lemma is a mathematical result from 1984: you can project high-dimensional points to far fewer dimensions while approximately preserving distances.

QJL applies this in reverse for error correction. Here's how:

  1. Extract the residual: After 3-bit quantization, calculate the reconstruction error (difference between original and quantized)
  2. Random project: Apply a random projection matrix that shrinks the error from d dimensions to just 1 bit's worth of information
  3. Quantize the projection: Store just 1 bit — essentially yes/no, correct/incorrect
  4. Reconstruct: During decoding, add back the 1-bit correction

The magic: a 1-bit correction nearly eliminates the residual error from the first stage. Total bits used: 3 + 1 = 4 bits per value. But the quality matches 16-bit baseline.

Component Input Output Purpose
PolarQuant 16-bit vector 3-bit code Main compression
QJL Reconstruction error 1-bit code Error correction
Combined 16-bit vector 4 bits total Zero-loss output

Why This Works: The Mathematical Insight

Traditional quantization treats each dimension independently — but high-dimensional vectors have structure that compression can exploit. PolarQuant and QJL work together because:

  • Rotation removes worst-case axes: Random rotation ensures no coordinate has outlier values
  • Polar transform concentrates: After transform, most information is in predictable places
  • JL correction is efficient: One bit of error correction goes far because it targets the right error

The key insight: you don't need many bits to correct errors if you project those errors into the right space. QJL does exactly that.

How It Compares to Alternative Approaches

Method Bits Accuracy Notes
FP16 baseline 16 100% Standard full precision
INT8 quantization 8 100% Standard approach
INT4 quantization 4 95-98% 5-10% loss typical
Standard 3-bit 3 90-93% Poor quality
TurboQuant (3+1) 4 100% Zero loss achieved

This comparison explains why TurboQuant's paper generated excitement. At 4 bits total (3 for PolarQuant + 1 for QJL), it matches 16-bit baseline — something no other 3-bit method achieves.

Implementation Status

Both algorithms are available as open source:

  • turbo-quant (Rust): Production-ready implementation from RecursiveIntell — supports both TurboQuant and separate PolarQuant/QJL
  • llama.cpp: Community integration available
  • vLLM: Integration in progress
  • PyTorch: Reference implementation

According to Google Research's official blog, TurboQuant was presented at ICLR 2026, PolarQuant at AISTATS 2026, and QJL at AISTATS 2026.

PolarQuant + QJL FAQ

Why does PolarQuant need random rotation?
Raw high-dimensional vectors often have axes with wildly different scales. Random rotation spreads the "energy" evenly across all coordinates, making each equally compressible. Without rotation, some axes dominate and waste quantization bits.
What exactly does QJL correct?
After 3-bit quantization, there's a small difference (residual error) between the original vector and its reconstructed version. QJL captures this error in just 1 bit — essentially marking whether to apply a small correction during decoding.
Is 1 bit enough for error correction?
Surprisingly yes. The Johnson-Lindenstrauss lemma proves you can preserve distance relationships with far fewer dimensions. QJL applies this insight: a well-chosen 1-bit signal corrects most of the residual error because it's targeting the right "direction" of error.
Can I use just PolarQuant without QJL?
Yes, but you'll lose some accuracy. PolarQuant alone achieves about 5x compression with small degradation. Adding QJL recovers that last bit of quality to achieve true zero-loss at 4-bit total.
Does this work for model weights too?
TurboQuant targets KV cache specifically. For model weights, different techniques work better. But the same principles (rotation +JL correction) could potentially apply — research is ongoing.
What's the performance overhead?
Near zero. The random rotation and projection are constants — computed once, applied everywhere. QJL uses lookup tables. Google reports "zero-overhead" in their paper — the encoding/decoding is negligible compared to attention computation.
How does this compare to KV cache pruning?
Pruning removes cache entries entirely — losing information. TurboQuant preserves all entries but compresses them. They're complementary: you could prune first (remove least important entries), then TurboQuant compress what remains.
Is this available in Hugging Face Transformers?
Not directly yet. The turbo-quant Rust library is the main production option. Community ports are in progress for llama.cpp and vLLM. PyTorch has reference implementations you can integrate.

For more on TurboQuant, explore our articles on TurboQuant Explained (FP4), TurboQuant 3-Bit Explained, DeepSeek Engram Memory, and Context Engineering Guide.

Questions about PolarQuant or QJL?

Join Now

Last Updated: April 29, 2026 | Source: Google Research, GitHub, TurboQuant.net