Show HN: GalaxDB – an open-source AI-native database(OLTP+vector+versioning)
GalaxDB is an open-source AI-native database that replaces separate relational, vector, embedding, storage, and pipeline services with a single binary speaking PostgreSQL wire protocol. It features semantic search, version snapshots, training data export, and high performance.
Uh oh!
There was an error while loading. Please reload this page.
Notifications You must be signed in to change notification settings
Fork 0
Star 8
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
153 Commits
153 Commits
.cargo
.cargo
.github
.github
Formula
Formula
assets
assets
bench-results
bench-results
benchmarks
benchmarks
crates
crates
docs
docs
galaxdb-python
galaxdb-python
scripts
scripts
tests
tests
.dockerignore
.dockerignore
.gitignore
.gitignore
CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
CONTRIBUTING.md
Cargo.lock
Cargo.lock
Cargo.toml
Cargo.toml
Cross.toml
Cross.toml
Dockerfile
Dockerfile
LICENSE
LICENSE
README.md
README.md
ROADMAP.md
ROADMAP.md
SECURITY.md
SECURITY.md
deny.toml
deny.toml
install.sh
install.sh
rust-toolchain.toml
rust-toolchain.toml
Repository files navigation
The AI-native database. SQL + vector search + training exports in one system.
What is GalaxDB?
Most AI applications bolt together 3–5 separate services: a relational database, a vector database, an embedding API, an object store, and a data pipeline. GalaxDB replaces all of them with a single binary that speaks PostgreSQL wire protocol. The release artifacts stay lightweight: galaxdb-server is 7.9 MB and galaxdb-sidecar is 7.6 MB.
Before GalaxDB: PostgreSQL + pgvector + Pinecone + OpenAI API + S3 + Airflow
After GalaxDB: galaxdb-server
One connection string. One backup. One monitoring endpoint. Your existing psycopg2, SQLAlchemy, and pg code works unchanged.
Quick Start
Python — embedded mode (no server, like SQLite)
pip install galaxdb-client
import galaxdb
db = galaxdb.Database("./mydata")
Create a table with an embedding column
db.execute(""" CREATE TABLE docs ( id INT PRIMARY KEY, text TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384 ) """)
Insert — embeddings computed automatically by the local sidecar
db.execute("INSERT INTO docs (id, text) VALUES (1, 'machine learning is great')") db.execute("INSERT INTO docs (id, text) VALUES (2, 'rust programming language')") db.execute("INSERT INTO docs (id, text) VALUES (3, 'deep neural networks')")
Semantic search — no external API, no separate vector DB
results = db.execute( "SELECT id, text FROM docs WHERE SEMANTIC_MATCH(text, 'AI and neural nets', 0.7)" )
Export a training dataset — one SQL command, Lance format, PyTorch-ready
db.execute("CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32'") path = db.training_dataset("v1")
import lance dataset = lance.dataset(path).to_pytorch() # zero-copy, memory-mapped
Server mode — multi-client, like PostgreSQL
macOS
brew tap zentrix-innovative-labs/tap && brew install galaxdb
Linux / macOS (direct install)
curl -fsSL https://raw.githubusercontent.com/zentrix-innovative-labs/galaxdb/main/install.sh | bash
Docker
docker run -p 5433:5433 -p 9090:9090 -v /data:/data \ harbi256/galaxdb:latest --data-dir /data
import galaxdb
conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable") conn.execute("SELECT id, text FROM docs WHERE SEMANTIC_MATCH(text, 'AI', 0.8)")
Any PostgreSQL client works — psycopg2, SQLAlchemy, tokio-postgres, pg (Node.js), JDBC.
AuroraSQL — SQL Extensions for AI
GalaxDB extends standard SQL with AI-native primitives:
-- Semantic search with similarity threshold SELECT id, title FROM articles WHERE SEMANTIC_MATCH(title, 'climate change policy', 0.75) AND published_at > '2024-01-01';
-- Time-travel query — reproduce exactly what data existed at a point in time SELECT * FROM docs AT VERSION 'training-v1';
-- Near-duplicate deduplication — cut training set size by 15–30% SELECT * FROM docs WHERE NOT DUPLICATE;
-- Create a versioned training snapshot CREATE VERSION TAG 'train-v2' FOR TRAINING WITH TRAINING PRECISION 'sq8' TRAINING SEED 42;
-- Bulk insert BULK INSERT INTO docs (id, text) VALUES (1, 'first document'), (2, 'second document');
-- Backup and restore BACKUP TO '/path/to/backup'; RESTORE FROM '/path/to/backup';
Performance
Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build.
HNSW Vector Search — SIFT-1M
ef_search recall@10 mean latency p99 latency
50 0.959 158 µs 228 µs
100 0.983 267 µs 364 µs
200 0.990 459 µs 616 µs
For methodology and the full SIFT-1M run, see the GalaxDB paper on Zenodo.
Storage Engine
Metric GalaxDB PostgreSQL 16 RocksDB
Write TPS 258,555 ~3,200 ~80,000
Read p50 3 µs ~95 µs ~180 µs
Read p99 47 µs ~300 µs ~500 µs
Scan throughput 4.49 GB/s ~0.9 GB/s —
740 Rust tests passing. 7 chaos scenarios in 10.9 s. See BENCHMARKS.md.
How It Compares
GalaxDB PostgreSQL + pgvector Pinecone Qdrant Weaviate LanceDB ChromaDB Milvus DuckDB
SQL queries ✅ Full ✅ Full ❌ ❌ Partial Partial¹ ❌ ❌ ✅ Full
Vector search ✅ recall=0.990 ⚠️ ~0.95 ✅ ✅ ✅ ✅ ✅ ✅ ❌
Local embeddings ✅ no API cost ❌ ❌ ⚠️ FastEmbed ✅ modules ✅ ✅ ❌ ❌
Time-travel ✅ AT VERSION ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌
Training export ✅ Lance format ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌
Near-dedup ✅ MinHash LSH ❌ ❌ ❌ ❌ ❌ ❌ ❌ ❌
Embedded mode ✅ ❌ ❌ ❌ ❌ ✅ ✅ ❌ ✅
PostgreSQL wire ✅ ✅ ❌ ❌ ❌ ❌ ❌ ❌ ❌
Self-hosted ✅ ✅ ❌ ✅ ✅ ✅ ✅ ✅ ✅
Encryption at rest ✅ AES-256-GCM ✅ OS-level ✅ ✅ ✅ ❌ ❌ ✅ ❌
MVCC / snapshots ✅ ✅ ❌ ❌ ❌ ❌ ❌ ❌ ❌
Single binary ✅ ❌ ❌ ✅ ❌ ✅ ✅ ❌ ✅
¹ LanceDB OSS uses a Python/Arrow API; SQL is available via DuckDB bridge or Enterprise tier only.
→ Full comparison with benchmarks, pricing, and use-case guidance
Architecture
Your application │ │ PostgreSQL wire protocol (port 5433) │ or Python embedded API ▼ ┌─────────────────────────────────────────────────────┐ │ galaxdb-server │ │ │ │ SQL Parser → Query Planner → Executor │ │ │ │ │ │ ART index HNSW graph LSM storage engine │ │ (point reads) (vector search) (WAL + PAX blocks) │ │ │ │ ┌──────────────────┐ HTTP :9090 │ │ │ galaxdb-sidecar │ /health /metrics │ │ │ (child process) │ │ │ │ ONNX/Candle model│ │ │ └──────────────────┘ │ └─────────────────────────────────────────────────────┘
The sidecar is spawned automatically — you don't manage it separately.
Use Cases
RAG applications — store documents, compute embeddings locally, query with SEMANTIC_MATCH filtered by metadata. No Pinecone, no OpenAI embeddings API.
ML training pipelines — CREATE VERSION TAG ... FOR TRAINING snapshots your data and exports it as a Lance dataset. Load directly into PyTorch with zero-copy memory mapping.
Hybrid search — combine SQL filters with vector similarity in a single query. No application-side join between two systems.
Audit-safe AI — AT VERSION queries let you reproduce exactly what data a model was trained on. EU AI Act compliance built in.
Time-series + semantic — store sensor readings with text descriptions, query by time range AND semantic similarity in one SQL statement.
Installation
Python (embedded + remote)
pip install galaxdb-client
Requires Python 3.9+. Pre-built wheels for Linux x86-64, macOS Intel, macOS Apple Silicon, and Windows x86-64.
macOS (Homebrew)
brew tap zentrix-innovative-labs/tap brew install galaxdb
Linux / macOS (direct install)
curl -fsSL https://raw.githubusercontent.com/zentrix-innovative-labs/galaxdb/main/install.sh | bash
Docker
docker run -p 5433:5433 -p 9090:9090 -v /data:/data \ harbi256/galaxdb:latest --data-dir /data
GitHub Releases
Download pre-built binaries for Linux x86-64 and macOS x86-64 from the Releases page.
Rust (embed in your application)
[dependencies] galaxdb-embedded = "1.0.0-beta"
Observability
Every server instance exposes:
Health check — reflects real subsystem state
curl http://localhost:9090/health
{"status":"ok","version":"1.0.0-beta.1","subsystems":{"disk_full":false,"sidecar_healthy":true,"connections_active":3}}
Prometheus metrics
curl http://localhost:9090/metrics
galaxdb_connections_active 3
galaxdb_wal_write_latency_us 42
galaxdb_hnsw_recall_estimate_bp 9902
galaxdb_embedding_queue_depth 0
...
Key Management
GalaxDB supports pluggable encryption key management with no vendor lock-in:
Local key file
GALAXDB_KEY_PROVIDER=local:/path/to/key.bin galaxdb-server ...
Environment variable
GALAXDB_KEY_PROVIDER=env:GALAXDB_MASTER_KEY galaxdb-server ...
Any KMS via shell command (AWS CLI, gcloud, az, vault, custom HSM)
GALAXDB_KEY_PROVIDER=command:aws kms decrypt ... galaxdb-server ...
HashiCorp Vault Transit
GALAXDB_KEY_PROVIDER=vault:transit/galaxdb-prod galaxdb-server ...
Security status
GalaxDB encrypts data at rest today (AES-256-GCM on every PAX block and WAL record, pluggable key management above). Network security is in active development:
Capability Status
Encryption at rest (AES-256-GCM, pluggable KMS) ✅ Available now
Wire authentication (SCRAM-SHA-256) 🚧 In progress
TLS transport encryption 🚧 In progress
Roles, privileges, GRANT/REVOKE 🚧 In progress
SSO / fine-grained RBAC / audit logging Enterprise edition
Until wire authentication and TLS land, run galaxdb-server only on a trusted network or loopback interface (the connection examples above use sslmode=disable accordingly). See ROADMAP.md for what is shipping next.
Documentation
Getting Started — installation, all features, Docker Compose, troubleshooting
Roadmap — shipped capabilities, in-progress hardening, and planned features (OSS vs Enterprise)
SQL Reference — full AuroraSQL syntax
Storage Engine — LSM tree, WAL, PAX blocks, HNSW
Benchmarks — SIFT-1M recall, write throughput, latency
Database Comparison — GalaxDB vs PostgreSQL, Pinecone, Qdrant, LanceDB, ChromaDB, Milvus, DuckDB, Weaviate
Research Paper — GalaxDB: A Unified AI-Native Storage Engine for Transactional, Analytical, and Vector Workloads
Contributing
See CONTRIBUTING.md. Open an issue first for large changes. All PRs must pass the full test suite and three CI gates (no mocks, no vendor SDKs, task tracker).
License
Apache 2.0 — see LICENSE.
Built by Zentrix Innovative Labs
About
GalaxDB is designed for AI and ML workloads that need more than a traditional database. Instead of stitching together a relational database, a vector store, an embedding API, and a training pipeline, GalaxDB provides all of these capabilities in one binary with a single SQL interface.
galaxdb.com
Topics
database
ai
vector-search
vector-database
Resources
Readme
License
Apache-2.0 license
Code of conduct
Code of conduct
Contributing
Contributing
Security policy
Security policy
Uh oh!
There was an error while loading. Please reload this page.
Activity
Custom properties
Stars
8 stars
Watchers
0 watching
Forks
0 forks
Report repository
Releases 2
GalaxDB v0.2.0
Latest
Jun 17, 2026
+ 1 release
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Rust 95.0%
Python 2.6%
Shell 2.2%
Other 0.2%