Most retrieval systems use a single pipeline for all queries, which either under-serves hard queries or wastes compute on easy ones. This article presents cheap signals—like score spread and retriever agreement—to detect weak retrieval without an LLM, enabling selective escalation only when needed.
Weak retrieval occurs when the needed evidence is missing from the top-k results, even if recall is high deeper down.
Cheap signals such as dense variance (spread), agreement between dense and sparse retrievers, and top-score height can predict weak retrieval with AUC up to 0.76.
Qdrant 1.18 ships TurboQuant, a new rotation-based vector quantization method from Google Research, with extensions for production embeddings. It offers 4-bit, 2-bit, 1.5-bit, and 1-bit options, outperforming or matching Scalar Quantization (SQ) and Binary Quantization (BQ) in compression and recall. The article explains the algorithm, Qdrant's enhancements (length renormalization and per-coordinate calibration), and benchmark results.
TurboQuant is a new rotation-based vector quantization algorithm that matches SQ recall at 4x compression and outperforms BQ by 9-24 percentage points at 2-bit and 1-bit modes.
Qdrant enhances TurboQuant with length renormalization and per-coordinate calibration (anisotropy compensation) to handle real-world embeddings.
This is Part 2 of a 5-part series on fine-tuning sparse embeddings for e-commerce search. It covers training a SPLADE model on Modal's serverless GPUs using the Amazon ESCI dataset, including data loading, product text formatting, Modal setup, model creation, training function, SpladeLoss, YAML configuration, parallel hyperparameter sweeps, and a pitfall to avoid.
Use Amazon ESCI dataset with Exact and Substitute pairs as positives for training SPLADE.
Product text formatting is critical for sparse embeddings; use bracket notation for brands and pipe separators for sections.
This article is Part 1 of a series on fine-tuning sparse embeddings for e-commerce search. It explains why dense embeddings fail in product search due to blurred exact matches, and how sparse embeddings preserve critical details. It introduces the SPLADE model, query expansion, and Qdrant's native sparse vector support. The fine-tuned system achieves a 29% improvement over BM25 on the Amazon ESCI dataset.
This article explores how Qdrant's Distance Matrix API enables efficient data exploration through dimensionality reduction, clustering, and graph-based visualization, helping to uncover hidden structures in large unstructured datasets.
In this article, Huong (Celine) Hoang shares her experience integrating ONNX cross-encoders into the FastEmbed library during Qdrant's Summer of Code 2024. The project enables re-ranking search results using relevance scores, enhancing context-aware search applications. Key challenges included building a new input-output scheme, tokenization, model loading, and testing. The functionality is available in FastEmbed 0.4.0.
Cross-encoders have been integrated into FastEmbed for re-ranking tasks.
The project used ONNX models to avoid heavy dependencies like PyTorch.
An introduction to vector databases, covering how they represent unstructured data as vectors, their architecture (collections, distance metrics, storage), core operations (indexing, searching, updating, deleting), advanced features (dense/sparse vectors, hybrid search, quantization, sharding, replication, multitenancy, security), and practical use cases.
Vector databases store and search unstructured data as high-dimensional vectors, enabling similarity search.
Key components: unique ID, dimensions (numerical representation), and payload (metadata).
Vector quantization compresses high-dimensional vectors to reduce memory usage and speed up search operations in large datasets. This article covers scalar, binary, and product quantization methods, along with techniques like oversampling, rescoring, and io_uring to balance accuracy and performance.
Vector quantization reduces memory footprint and search latency, crucial for scaling to millions of vectors.
Scalar quantization (float32 to int8) offers 75% memory reduction with minimal accuracy loss.
Qdrant 1.7.0 introduces native support for sparse vectors, enabling keyword-based search and hybrid search; a new Discovery API for more precise vector search, including discovery search and context search; user-defined sharding for flexible data distribution; snapshot-based shard transfer for efficient cluster scaling; and various performance improvements.
Native sparse vector support enables hybrid keyword-semantic search
Discovery API offers discovery search and context search modes
Qdrant announces $7.5M seed funding led by Unusual Ventures. The article discusses the importance of vector databases in the AI age, the explosive growth of unstructured data, and Qdrant's progress as an open-source vector similarity search solution.
Qdrant raised $7.5M seed funding from Unusual Ventures and others.
Vector databases are foundational to the new AI stack for handling unstructured data.