How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor

[3], an online vector quantization method, drew wide public attention at ICLR 2026. For me, it looked very familiar: it overlaps heavily with EDEN, a quantization method first introduced as the 1-bit method DRIVE at NeurIPS 2021 [1] and generalized to arbitrary bit-widths at ICML 2022 [2]. Co-authored by myself, with Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.

The TurboQuant paper presents two variants: TurboQuant-mse and TurboQuant-prod. In a detailed new comparison [5] we show that TurboQuant-mse is a degenerate case of EDEN, and that the EDEN variants consistently outperform their counterparts.

How EDEN quantizes a vector

Suppose you need to compress a $d$ -dimensional vector $x$ (a gradient update, an embedding, a KV-cache entry) down to a few bits per coordinate. EDEN proceeds in four steps:

Random rotation — Multiply by a random orthogonal matrix $\Pi$ . After rotation the coordinates are identically distributed and, for large $d$ , approximately Gaussian.

Scalar quantization — Round each rotated coordinate to one of $2^b$ levels from a Lloyd–Max codebook trained on the known rotated coordinate distribution ( $b$ is the target number of bits per coordinate).

Scale — Multiply by a scale factor $S$ .

Inverse rotation — Apply $\Pi^\top$ to recover an approximation $\hat{x}$ of the original vector.

While earlier works (e.g., Suresh et al. (2017) [6]) used rotation mainly to shrink the coordinates’ dynamic range (the gap between the largest and smallest coordinate value), EDEN [1] was, to the best of our knowledge, the first quantization scheme to exploit a stronger fact about random rotation: the post-rotation coordinates follow a known distribution, which lets us use a deterministic quantizer paired with a closed-form scale that, depending on the application, either minimizes MSE or makes the estimate unbiased. Both scales are derived analytically, and the construction yields an asymptotic MSE reduction over the previous approach.

Concretely, EDEN’s two variants differ only in the choice of $S$ :

EDEN-biased — sets $S$ to the closed-form value that minimizes the reconstruction MSE.

EDEN-unbiased — chooses $S$ so the decompressed output is correct on average ( $\mathbb{E}[\hat{x}] = x$ ), which matters particularly whenever you average many quantized vectors (e.g., distributed training, attention).

Lined up against EDEN, TurboQuant-mse matches at every step except one: where EDEN derives the scale $S$ analytically, TurboQuant-mse, although it targets MSE minimization, skips the optimized scaling.

The pseudocode below shows the three side by side.

Figure 1: EDEN’s pseudocode instantiated for EDEN-biased, EDEN-unbiased, and TurboQuant-mse. The three are identical except at step 5: the choice of S. Image by author [5].

Why the optimal scale is worth it

The value of applying proper scale $S$ grows with bit-width. At $b = 1$ bit, the gap is marginal. At $d = 128$ and $b = 4$ bits, EDEN-biased reduces MSE by 2.25% over TurboQuant-mse, and these are the bit-widths practitioners actually use for embeddings and KV caches.

Across dimensions 16 to 4096 and all tested bit-widths $b \in \{1,2,3,4\}$ , EDEN-biased vNMSE (vector-normalized MSE, $\mathbb{E}[\|x – \hat{x}\|^2] / \|x\|^2$ ) falls below TurboQuant-mse’s in every case (Figure 2). As dimension grows very large, the optimal $S$ approaches 1 and the two algorithms converge, but at practical dimensions (128–1024), the gap persists.

Figure 2: vNMSE vs. dimension comparing EDEN-biased and TurboQuant-mse across bit-widths $b \in \{1,2,3,4\}$ (panels left to right). EDEN-biased (which optimizes the scale factor $S$ ) achieves lower error than TurboQuant-mse (which fixes $S=1$ ) at every tested dimension. The curves converge at high dimension as the optimal $S$ approaches 1. Image by author [5].

Unbiased compression: saving more than a full bit

The results above concern the biased (MSE-minimizing) variants. Now consider the unbiased case, where applications such as distributed training, approximate attention, or inner-product retrieval need $\mathbb{E}[\hat{x}] = x$ because they average many quantized vectors.

EDEN-unbiased uses the same single-pass algorithm as EDEN-biased, just with $S$ chosen for bias correction. TurboQuant’s unbiased variant, TurboQuant-prod, takes a different route: it spends $(b-1)$ bits on the biased TurboQuant-mse step and reserves 1 bit for a QJL (Quantized Johnson–Lindenstrauss) [4] correction on the residual (QJL is similar to EDEN at $b=1$ , but with higher variance).

EDEN-unbiased outperforms TurboQuant-prod in every tested configuration, and by a substantial margin. The gap traces to three structural advantages of EDEN’s single-pass design:

EDEN optimizes the scale. TurboQuant-prod inherits TurboQuant-mse’s $s=1$ first stage, so it carries the same MSE penalty.

EDEN’s 1-bit construction has lower variance than QJL. In large dimensions, EDEN’s 1-bit vNMSE converges to $\pi/2 – 1 \approx 0.57$ [1], while QJL’s converges to $\pi/2 \approx 1.57$ [4], roughly 2.75× higher.

EDEN spends the full bit budget on a single unbiased quantizer. TurboQuant-prod splits the budget into $(b-1)$ biased bits plus 1 residual bit, which empirically underperforms spending all $b$ bits on a single unbiased quantizer [5].

These effects compound. The result: 1-bit, 2-bit, and 3-bit EDEN-unbiased are each more accurate than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively (Figure 3). By swapping in EDEN you can drop a bit per coordinate and still match TurboQuant-prod’s accuracy.

Figure 3: vNMSE vs. dimension comparing EDEN-unbiased and TurboQuant-prod across bit-widths $b \in \{1,2,3,4\}$ (panels left to right). EDEN-unbiased achieves lower error at every dimension. The gap is large enough that EDEN with $b$ bits often outperforms TurboQuant-prod with $b + 1$ bits. Image by author [5].

On TurboQuant’s own benchmarks

The same picture holds on the standard ANN benchmarks TurboQuant evaluates on, Stanford’s GloVe pre-trained word vectors (Open Data Commons Public Domain Dedication and License v1.0) and Qdrant’s dbpedia-entities-openai3-text-embedding-3-large embeddings (Apache 2.0), using TurboQuant’s published evaluation code:

EDEN-biased achieves lower MSE than TurboQuant-mse, EDEN-unbiased achieves markedly lower inner-product error than TurboQuant-prod, and nearest-neighbor recall on both datasets favors EDEN (Figure 4).

Figure 4: Nearest-neighbor recall on GloVe and OpenAI3 embeddings at 2 and 4 bits per coordinate. EDEN-unbiased outperforms TurboQuant-prod across all four settings. Image by author [5].

Takeaway: use EDEN; optimal scaling matters

EDEN’s scale connects the known post-rotation distribution to an analytically optimal quantizer. TurboQuant-mse keeps EDEN’s rotation and the codebook but pins $S=1$ , which is what makes it a strictly weaker special case. TurboQuant-prod adds a 1-bit QJL stage on top of that, where EDEN-unbiased gets the same property, with better accuracy, by just picking a bias-correcting scale.

For MSE-targeted compression (model weight quantization, nearest-neighbor search, KV cache): EDEN-biased computes the optimal scale $S$ and consistently beats TurboQuant-mse (which is EDEN with $S=1$ fixed).

For unbiased estimation (distributed mean estimation, approximate attention, inner-product retrieval): EDEN-unbiased substantially outperforms TurboQuant-prod’s bit-splitting strategy, by margins worth more than a full bit per coordinate.

EDEN was originally developed for distributed mean estimation in federated and distributed training. Subsequent work has, for example, applied it to embedding compression for document re-ranking (SDR, 2022 [8]), adapted it for NVFP4 LLM training (MS-EDEN in Quartet II, 2026 [10]), generalized it to vector quantization for data-free LLM weight compression (HIGGS, 2025 [9]), which was then used for KV-cache compression (AQUA-KV, 2025 [11]).

EDEN implementations are available: in PyTorch and TensorFlow, in Intel’s OpenFL [7], and its 1-bit variant in Google’s FedJax, TensorFlow Federated, and TensorFlow Model Optimization.

For the full technical comparison analysis with TurboQuant (all figures, detailed experimental methodology), see our note [5].

For the original derivations, proofs, and further extensions, see our original papers [1] [2].

References

S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Mean Estimation (2021), NeurIPS 2021.

S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning (2022), ICML 2022.

A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (2026), ICLR 2026.

A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.

R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work (2026), arXiv:2604.18555.

A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan, Distributed Mean Estimation with Limited Communication (2017), ICML 2017.

VMware Open Source Blog, VMware Research Group’s EDEN Becomes Part of OpenFL (November 2022).

N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Efficient Neural Re-ranking using Succinct Document Representation (2022), ACL 2022.

V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (2025), NAACL 2025.

A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation (2026), arXiv:2601.22813.

A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models (2025), ICML 2025.

What's Hot

$1.4M Worth Of Cars Disappeared, And Now Toyota Is Suing A Dealership For Far More

Tech Deals Live Blog 2026: The Best Deals, All in One Place

How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor

Hidden IT problems are quietly creating risk, shadow IT, and lost productivity

Building Long-Term Memory for AI Agents

Top Crypto Compliance Frameworks Worldwide

Must-Have AI Tools for Work and Personal Productivity

Best AI Daily Tools for Notes and Task Planning

Punkt Has a New Smartphone for People Who Hate Smartphones

Subscribe to Updates

What's Hot

How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor

How EDEN quantizes a vector

Why the optimal scale is worth it

Unbiased compression: saving more than a full bit

On TurboQuant’s own benchmarks

Takeaway: use EDEN; optimal scaling matters

References

Related Posts