TurboQuant achieves zero accuracy loss through two complementary algorithms: PolarQuant (random rotation + polar transform) and QJL (1-bit residual correction). Together, they compress KV cache 6x with no quality degradation.
We've covered TurboQuant's results — 6x memory reduction, 8x faster attention, zero accuracy loss. But how does it actually work? The secret lies in two algorithms that TurboQuant combines: PolarQuant and QJL (Quantized Johnson-Lindenstrauss).
Understanding these two stages explains why TurboQuant achieves what traditional quantization cannot — lossless compression at aggressive bit-widths.
Why Two Stages? The Quantization Problem
Traditional quantization faces a fundamental trade-off: aggressive compression loses accuracy, conservative compression doesn't save enough memory. At 3-bit, most methods lose 5-10% accuracy because they treat each dimension independently.
TurboQuant solves this with a two-stage approach:
- Stage 1 — PolarQuant: Rotate and transform the vector so it's easy to compress
- Stage 2 — QJL: Correct any remaining errors with 1-bit precision
This combination — simple transform, tiny correction — achieves what single-stage methods cannot.
Stage 1: PolarQuant — Making Vectors Easy to Quantize
PolarQuant (to be presented at AISTATS 2026) uses two tricks: random rotation and polar transform.
Trick 1: Random Rotation
High-dimensional vectors (like KV cache entries) often have axes with very different scales — some directions have huge values, others tiny. Standard quantization treats all axes equally, wasting bits on meaningless variations.
PolarQuant applies a random rotation to the vector. In high dimensions, random projections "spread out" the energy — no single coordinate dominates. This creates a vector where all dimensions have similar scales.
Trick 2: Polar Transform
After rotation, PolarQuant applies a polar transform — converting the vector into magnitude and direction. Imagine converting rectangular XYZ coordinates to spherical coordinates (radius + angles).
Why does this help? The magnitude (radius) tends to concentrate around a small range, while the angular coordinates have predictable distributions. Both are easier to quantize than the original raw values.
| Stage | What It Does | Result |
|---|---|---|
| Input | Raw KV cache vector | d dimensions, FP16 |
| Step 1 | Random rotation | Energy spread equally |
| Step 2 | Polar transform | Easy-to-quantize format |
| Quantize | 3-bit encoding | ~5x compression |
After PolarQuant's two steps, the vector is much easier to compress. But some small errors remain — this is where QJL comes in.
Stage 2: QJL — Cleaning Up the Residual
QJL (Quantized Johnson-Lindenstrauss) was presented at AISTATS 2026. It tackles the error that PolarQuant leaves behind.
The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss lemma is a mathematical result from 1984: you can project high-dimensional points to far fewer dimensions while approximately preserving distances.
QJL applies this in reverse for error correction. Here's how:
- Extract the residual: After 3-bit quantization, calculate the reconstruction error (difference between original and quantized)
- Random project: Apply a random projection matrix that shrinks the error from d dimensions to just 1 bit's worth of information
- Quantize the projection: Store just 1 bit — essentially yes/no, correct/incorrect
- Reconstruct: During decoding, add back the 1-bit correction
The magic: a 1-bit correction nearly eliminates the residual error from the first stage. Total bits used: 3 + 1 = 4 bits per value. But the quality matches 16-bit baseline.
| Component | Input | Output | Purpose |
|---|---|---|---|
| PolarQuant | 16-bit vector | 3-bit code | Main compression |
| QJL | Reconstruction error | 1-bit code | Error correction |
| Combined | 16-bit vector | 4 bits total | Zero-loss output |
Why This Works: The Mathematical Insight
Traditional quantization treats each dimension independently — but high-dimensional vectors have structure that compression can exploit. PolarQuant and QJL work together because:
- Rotation removes worst-case axes: Random rotation ensures no coordinate has outlier values
- Polar transform concentrates: After transform, most information is in predictable places
- JL correction is efficient: One bit of error correction goes far because it targets the right error
The key insight: you don't need many bits to correct errors if you project those errors into the right space. QJL does exactly that.
How It Compares to Alternative Approaches
| Method | Bits | Accuracy | Notes |
|---|---|---|---|
| FP16 baseline | 16 | 100% | Standard full precision |
| INT8 quantization | 8 | 100% | Standard approach |
| INT4 quantization | 4 | 95-98% | 5-10% loss typical |
| Standard 3-bit | 3 | 90-93% | Poor quality |
| TurboQuant (3+1) | 4 | 100% | Zero loss achieved |
This comparison explains why TurboQuant's paper generated excitement. At 4 bits total (3 for PolarQuant + 1 for QJL), it matches 16-bit baseline — something no other 3-bit method achieves.
Implementation Status
Both algorithms are available as open source:
- turbo-quant (Rust): Production-ready implementation from RecursiveIntell — supports both TurboQuant and separate PolarQuant/QJL
- llama.cpp: Community integration available
- vLLM: Integration in progress
- PyTorch: Reference implementation
According to Google Research's official blog, TurboQuant was presented at ICLR 2026, PolarQuant at AISTATS 2026, and QJL at AISTATS 2026.
PolarQuant + QJL FAQ
Why does PolarQuant need random rotation?
What exactly does QJL correct?
Is 1 bit enough for error correction?
Can I use just PolarQuant without QJL?
Does this work for model weights too?
What's the performance overhead?
How does this compare to KV cache pruning?
Is this available in Hugging Face Transformers?
For more on TurboQuant, explore our articles on TurboQuant Explained (FP4), TurboQuant 3-Bit Explained, DeepSeek Engram Memory, and Context Engineering Guide.
Questions about PolarQuant or QJL?
Join NowLast Updated: April 29, 2026 | Source: Google Research, GitHub, TurboQuant.net