AI News HubLIVE
In-site rewrite5 min read

Show HN: GalaxDB – an open-source AI-native database(OLTP+vector+versioning)

GalaxDB is an open-source AI-native database that replaces separate relational, vector, embedding, storage, and pipeline services with a single binary speaking PostgreSQL wire protocol. It features semantic search, version snapshots, training data export, and high performance.

SourceHacker News AIAuthor: galaxdb

Uh oh!

There was an error while loading. Please reload this page.

Notifications You must be signed in to change notification settings

Fork 0

Star 8

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

153 Commits

153 Commits

.cargo

.cargo

.github

.github

Formula

Formula

assets

assets

bench-results

bench-results

benchmarks

benchmarks

crates

crates

docs

docs

galaxdb-python

galaxdb-python

scripts

scripts

tests

tests

.dockerignore

.dockerignore

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

Cross.toml

Cross.toml

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

ROADMAP.md

ROADMAP.md

SECURITY.md

SECURITY.md

deny.toml

deny.toml

install.sh

install.sh

rust-toolchain.toml

rust-toolchain.toml

Repository files navigation

The AI-native database. SQL + vector search + training exports in one system.

What is GalaxDB?

Most AI applications bolt together 3–5 separate services: a relational database, a vector database, an embedding API, an object store, and a data pipeline. GalaxDB replaces all of them with a single binary that speaks PostgreSQL wire protocol. The release artifacts stay lightweight: galaxdb-server is 7.9 MB and galaxdb-sidecar is 7.6 MB.

Before GalaxDB: PostgreSQL + pgvector + Pinecone + OpenAI API + S3 + Airflow

After GalaxDB: galaxdb-server

One connection string. One backup. One monitoring endpoint. Your existing psycopg2, SQLAlchemy, and pg code works unchanged.

Quick Start

Python — embedded mode (no server, like SQLite)

pip install galaxdb-client

import galaxdb

db = galaxdb.Database("./mydata")

Create a table with an embedding column

db.execute(""" CREATE TABLE docs ( id INT PRIMARY KEY, text TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384 ) """)

Insert — embeddings computed automatically by the local sidecar

db.execute("INSERT INTO docs (id, text) VALUES (1, 'machine learning is great')") db.execute("INSERT INTO docs (id, text) VALUES (2, 'rust programming language')") db.execute("INSERT INTO docs (id, text) VALUES (3, 'deep neural networks')")

Semantic search — no external API, no separate vector DB

results = db.execute( "SELECT id, text FROM docs WHERE SEMANTIC_MATCH(text, 'AI and neural nets', 0.7)" )

Export a training dataset — one SQL command, Lance format, PyTorch-ready

db.execute("CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32'") path = db.training_dataset("v1")

import lance dataset = lance.dataset(path).to_pytorch() # zero-copy, memory-mapped

Server mode — multi-client, like PostgreSQL

macOS

brew tap zentrix-innovative-labs/tap && brew install galaxdb

Linux / macOS (direct install)

curl -fsSL https://raw.githubusercontent.com/zentrix-innovative-labs/galaxdb/main/install.sh | bash

Docker

docker run -p 5433:5433 -p 9090:9090 -v /data:/data \ harbi256/galaxdb:latest --data-dir /data

import galaxdb

conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable") conn.execute("SELECT id, text FROM docs WHERE SEMANTIC_MATCH(text, 'AI', 0.8)")

Any PostgreSQL client works — psycopg2, SQLAlchemy, tokio-postgres, pg (Node.js), JDBC.

AuroraSQL — SQL Extensions for AI

GalaxDB extends standard SQL with AI-native primitives:

-- Semantic search with similarity threshold SELECT id, title FROM articles WHERE SEMANTIC_MATCH(title, 'climate change policy', 0.75) AND published_at > '2024-01-01';

-- Time-travel query — reproduce exactly what data existed at a point in time SELECT * FROM docs AT VERSION 'training-v1';

-- Near-duplicate deduplication — cut training set size by 15–30% SELECT * FROM docs WHERE NOT DUPLICATE;

-- Create a versioned training snapshot CREATE VERSION TAG 'train-v2' FOR TRAINING WITH TRAINING PRECISION 'sq8' TRAINING SEED 42;

-- Bulk insert BULK INSERT INTO docs (id, text) VALUES (1, 'first document'), (2, 'second document');

-- Backup and restore BACKUP TO '/path/to/backup'; RESTORE FROM '/path/to/backup';

Performance

Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build.

HNSW Vector Search — SIFT-1M

ef_search recall@10 mean latency p99 latency

50 0.959 158 µs 228 µs

100 0.983 267 µs 364 µs

200 0.990 459 µs 616 µs

For methodology and the full SIFT-1M run, see the GalaxDB paper on Zenodo.

Storage Engine

Metric GalaxDB PostgreSQL 16 RocksDB

Write TPS 258,555 ~3,200 ~80,000

Read p50 3 µs ~95 µs ~180 µs

Read p99 47 µs ~300 µs ~500 µs

Scan throughput 4.49 GB/s ~0.9 GB/s —

740 Rust tests passing. 7 chaos scenarios in 10.9 s. See BENCHMARKS.md.

How It Compares

GalaxDB PostgreSQL + pgvector Pinecone Qdrant Weaviate LanceDB ChromaDB Milvus DuckDB

SQL queries ✅ Full ✅ Full ❌ ❌ Partial Partial¹ ❌ ❌ ✅ Full

Vector search ✅ recall=0.990 ⚠️ ~0.95 ✅ ✅ ✅ ✅ ✅ ✅ ❌

Local embeddings ✅ no API cost ❌ ❌ ⚠️ FastEmbed ✅ modules ✅ ✅ ❌ ❌

Time-travel ✅ AT VERSION ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌

Training export ✅ Lance format ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌

Near-dedup ✅ MinHash LSH ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌

Embedded mode ✅ ❌ ❌ ❌ ❌ ✅ ✅ ❌ ✅

PostgreSQL wire ✅ ✅ ❌ ❌ ❌ ❌ ❌ ❌ ❌

Self-hosted ✅ ✅ ❌ ✅ ✅ ✅ ✅ ✅ ✅

Encryption at rest ✅ AES-256-GCM ✅ OS-level ✅ ✅ ✅ ❌ ❌ ✅ ❌

MVCC / snapshots ✅ ✅ ❌ ❌ ❌ ❌ ❌ ❌ ❌

Single binary ✅ ❌ ❌ ✅ ❌ ✅ ✅ ❌ ✅

¹ LanceDB OSS uses a Python/Arrow API; SQL is available via DuckDB bridge or Enterprise tier only.

→ Full comparison with benchmarks, pricing, and use-case guidance

Architecture

Your application │ │ PostgreSQL wire protocol (port 5433) │ or Python embedded API ▼ ┌─────────────────────────────────────────────────────┐ │ galaxdb-server │ │ │ │ SQL Parser → Query Planner → Executor │ │ │ │ │ │ ART index HNSW graph LSM storage engine │ │ (point reads) (vector search) (WAL + PAX blocks) │ │ │ │ ┌──────────────────┐ HTTP :9090 │ │ │ galaxdb-sidecar │ /health /metrics │ │ │ (child process) │ │ │ │ ONNX/Candle model│ │ │ └──────────────────┘ │ └─────────────────────────────────────────────────────┘

The sidecar is spawned automatically — you don't manage it separately.

Use Cases

RAG applications — store documents, compute embeddings locally, query with SEMANTIC_MATCH filtered by metadata. No Pinecone, no OpenAI embeddings API.

ML training pipelines — CREATE VERSION TAG ... FOR TRAINING snapshots your data and exports it as a Lance dataset. Load directly into PyTorch with zero-copy memory mapping.

Hybrid search — combine SQL filters with vector similarity in a single query. No application-side join between two systems.

Audit-safe AI — AT VERSION queries let you reproduce exactly what data a model was trained on. EU AI Act compliance built in.

Time-series + semantic — store sensor readings with text descriptions, query by time range AND semantic similarity in one SQL statement.

Installation

Python (embedded + remote)

pip install galaxdb-client

Requires Python 3.9+. Pre-built wheels for Linux x86-64, macOS Intel, macOS Apple Silicon, and Windows x86-64.

macOS (Homebrew)

brew tap zentrix-innovative-labs/tap brew install galaxdb

Linux / macOS (direct install)

curl -fsSL https://raw.githubusercontent.com/zentrix-innovative-labs/galaxdb/main/install.sh | bash

Docker

docker run -p 5433:5433 -p 9090:9090 -v /data:/data \ harbi256/galaxdb:latest --data-dir /data

GitHub Releases

Download pre-built binaries for Linux x86-64 and macOS x86-64 from the Releases page.

Rust (embed in your application)

[dependencies] galaxdb-embedded = "1.0.0-beta"

Observability

Every server instance exposes:

Health check — reflects real subsystem state

curl http://localhost:9090/health

{"status":"ok","version":"1.0.0-beta.1","subsystems":{"disk_full":false,"sidecar_healthy":true,"connections_active":3}}

Prometheus metrics

curl http://localhost:9090/metrics

galaxdb_connections_active 3

galaxdb_wal_write_latency_us 42

galaxdb_hnsw_recall_estimate_bp 9902

galaxdb_embedding_queue_depth 0

...

Key Management

GalaxDB supports pluggable encryption key management with no vendor lock-in:

Local key file

GALAXDB_KEY_PROVIDER=local:/path/to/key.bin galaxdb-server ...

Environment variable

GALAXDB_KEY_PROVIDER=env:GALAXDB_MASTER_KEY galaxdb-server ...

Any KMS via shell command (AWS CLI, gcloud, az, vault, custom HSM)

GALAXDB_KEY_PROVIDER=command:aws kms decrypt ... galaxdb-server ...

HashiCorp Vault Transit

GALAXDB_KEY_PROVIDER=vault:transit/galaxdb-prod galaxdb-server ...

Security status

GalaxDB encrypts data at rest today (AES-256-GCM on every PAX block and WAL record, pluggable key management above). Network security is in active development:

Capability Status

Encryption at rest (AES-256-GCM, pluggable KMS) ✅ Available now

Wire authentication (SCRAM-SHA-256) 🚧 In progress

TLS transport encryption 🚧 In progress

Roles, privileges, GRANT/REVOKE 🚧 In progress

SSO / fine-grained RBAC / audit logging Enterprise edition

Until wire authentication and TLS land, run galaxdb-server only on a trusted network or loopback interface (the connection examples above use sslmode=disable accordingly). See ROADMAP.md for what is shipping next.

Documentation

Getting Started — installation, all features, Docker Compose, troubleshooting

Roadmap — shipped capabilities, in-progress hardening, and planned features (OSS vs Enterprise)

SQL Reference — full AuroraSQL syntax

Storage Engine — LSM tree, WAL, PAX blocks, HNSW

Benchmarks — SIFT-1M recall, write throughput, latency

Database Comparison — GalaxDB vs PostgreSQL, Pinecone, Qdrant, LanceDB, ChromaDB, Milvus, DuckDB, Weaviate

Research Paper — GalaxDB: A Unified AI-Native Storage Engine for Transactional, Analytical, and Vector Workloads

Contributing

See CONTRIBUTING.md. Open an issue first for large changes. All PRs must pass the full test suite and three CI gates (no mocks, no vendor SDKs, task tracker).

License

Apache 2.0 — see LICENSE.

Built by Zentrix Innovative Labs

About

GalaxDB is designed for AI and ML workloads that need more than a traditional database. Instead of stitching together a relational database, a vector store, an embedding API, and a training pipeline, GalaxDB provides all of these capabilities in one binary with a single SQL interface.

galaxdb.com

Topics

database

ai

vector-search

vector-database

Resources

Readme

License

Apache-2.0 license

Code of conduct

Code of conduct

Contributing

Contributing

Security policy

Security policy

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

8 stars

Watchers

0 watching

Forks

0 forks

Report repository

Releases 2

GalaxDB v0.2.0

Latest

Jun 17, 2026

+ 1 release

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Rust 95.0%

Python 2.6%

Shell 2.2%

Other 0.2%

Show HN: GalaxDB – an open-source AI-native database(OLTP+vector+versioning) | AI News Hub