📊 quantvec

project

Jun 2026

Add vectors, search instantly. No training, no native build, no server. A zero-dependency TypeScript library for data-oblivious vector quantization and flat ANN search, implementing Google Research's TurboQuant + RaBitQ with WASM v128 FastScan. Compresses embeddings 7.9-15.7x and runs on Node, browsers, Bun, Cloudflare Workers, and React Native.

External Links

Demo Source Code

Achievements

Library Release

First public release on npm and GitHub: TurboQuantIndex, IdMapIndex, createCollection, WASM FastScan, and isomorphic serialization.

Jun 9, 2026

View Release

Real-Dataset Benchmarks

Validated on SIFT-small (10k x 128-d) and GloVe-200 (100k x 200-d text embeddings). At 4-bit: 0.888 recall@10 on SIFT, 0.880 on GloVe, with FastScan reaching ~2055 QPS on 10k vectors.

Jun 9, 2026

𝗦𝗶𝘁𝘂𝗮𝘁𝗶𝗼𝗻

In April 2025, Google Research published TurboQuant (Zandieh, Daliri, Hadian, Mirrokni, arXiv:2504.19874), a data-oblivious vector quantization algorithm that removes the training phase entirely. The insight: a random rotation makes every coordinate follow a known Beta distribution, so the MSE-optimal codebook is fully determined by (dim, bits) with no data at all. Combined with Gao & Long's RaBitQ (SIGMOD 2024) unbiased-estimator correction, it achieves recall competitive with trained PQ indexes. But no usable implementation existed. Every production vector quantizer (FAISS, ScaNN, Milvus) still required training and shipped as a native C++/Rust binary. That is a non-starter in the JavaScript ecosystem, where serverless functions cold-start in milliseconds, edge runtimes forbid native addons, and React Native apps cannot bundle native builds.

𝗧𝗮𝘀𝗸

Bring Google Research's TurboQuant algorithm to the JavaScript ecosystem as a clean-room, production-grade, zero-dependency TypeScript implementation. Deliver the paper's compression and recall guarantees with instant indexing (no training phase), running on Node, browsers, Bun, Cloudflare Workers, and React Native.

𝗔𝗰𝘁𝗶𝗼𝗻

Built a clean-room implementation of Google's TurboQuant + RaBitQ papers as a complete encode/search pipeline: unit normalization, then data-independent random rotation (O(d log d) FWHT for power-of-two dimensions, dense Householder QR otherwise), then MSE-optimal Lloyd-Max scalar quantization at 2/3/4 bits derived from the known Beta marginal, then true bit-packed serialization, then per-vector RaBitQ length-renormalization scale yielding unbiased inner-product estimates at query time.

Designed a layered API surface: TurboQuantIndex (raw positional flat index for maximum throughput), IdMapIndex (stable string/number/bigint external ids with O(1) swap-remove and predicate filtering), and createCollection (a Qdrant-inspired ergonomic layer with typed payloads and a structured filter DSL supporting must/should/must_not clauses with match, range, and hasId conditions).

Engineered two WASM acceleration paths compiled from AssemblyScript. The default is an exact f64 scoring kernel, bit-identical to the pure-TS oracle, with codes resident in linear memory (~1.3x faster). The optional v128 FastScan kernel reorganizes codes into blocked 16-vector tiles, builds a quantized u8 lookup table per query, and uses v128.swizzle for 16 simultaneous table lookups per coordinate. This achieves 5.7x speedup at 50k vectors on Apple Silicon with exact rescore of the candidate pool preserving recall.

Shipped isomorphic toBytes/fromBytes persistence with a versioned binary format (QVEC header), typed discriminated error classes at every API boundary, and a quantvec/node subpath with filesystem helpers. The same index serializes to IndexedDB, Cloudflare R2, or disk with zero server-side logic.

𝗥𝗲𝘀𝘂𝗹𝘁

The library achieves 0.888 recall@10 on SIFT-small (10k x 128-d) and 0.880 recall@10 on 100k GloVe-200 word embeddings at 4-bit, with 7.1-7.4x compression, zero training time, and zero runtime dependencies. FastScan reaches ~2055 QPS on 10k vectors and ~1350 QPS on 50k vectors single-threaded. Published to npm as v0.0.1 with full CI, benchmarks validated on multiple real datasets, research notes, and an interactive documentation site.