2026-06-16站内改写6 min readUpdated: 2026-06-16

How to Use an Nvidia EGPU with Your Mac for Local AI in 2026

Apple has approved Tiny Corp's TinyGPU driver, enabling Nvidia and AMD eGPUs to work on Apple Silicon Macs for compute workloads. This guide covers hardware recommendations, setup, and performance benchmarks for running CUDA-based local AI.

SourceHacker News AIAuthor: falava

Our Top Pick

NVIDIA GeForce RTX 4090

$1,599 – $1,999

24GB GDDR6X16,3841,008 GB/s

Check Price on AmazonFull review →

As of April 2026, you can run Nvidia CUDA workloads on your Mac. That sentence was impossible to write two weeks ago. On April 4, 2026, Apple officially signed and notarized Tiny Corp's TinyGPU driver — the first-ever sanctioned path for Nvidia (and AMD) external GPUs to work on Apple Silicon Macs for compute workloads. No System Integrity Protection hacks, no unsigned kexts, no prayer.

For anyone who's been running local AI on a Mac — whether that's Ollama, llama.cpp, or Stable Diffusion via MLX — this changes the calculus entirely. You can now plug an RTX 4090 into your Mac Mini M4 Pro via Thunderbolt 4 and get full CUDA acceleration for inference, fine-tuning, and image generation. Your Mac's unified memory handles overflow. It's the best of both worlds.

This guide is the first comprehensive buyer's guide and setup walkthrough for running an Nvidia eGPU on Mac for local AI. We'll cover which GPUs and enclosures to buy, which Mac to use as your base, step-by-step driver installation, performance benchmarks, and honest limitations. If you've been waiting for this moment, here's everything you need to act on it.

What Just Happened — Apple Approved Nvidia eGPU Drivers for Mac

The story starts with George Hotz and Tiny Corp, the team behind tinygrad. Hotz — famous for jailbreaking the iPhone and hacking the PS3 — has been working on making GPUs programmable across platforms since 2023. The TinyGPU driver is their most ambitious project: a universal compute driver that lets any GPU work on any OS.

"We're not doing graphics. We're not replacing Metal. We're doing compute, and we're doing it right," Hotz said in his April 5 livestream announcing the Apple signing. "Apple looked at the driver, looked at our test suite, and signed it. No meetings, no partnerships — they just approved it through the standard notarization process."

What makes this different from previous eGPU attempts on Mac:

Apple-signed and notarized: No SIP disabling. Install the kext, approve in System Settings, done. This is the standard macOS security flow.

Compute-only: The driver exposes CUDA (Nvidia) and ROCm (AMD) compute capabilities — not display output, not Metal, not gaming. It's purpose-built for AI/ML, scientific computing, and data processing.

Thunderbolt 4 / USB4: Works over standard TB4 cables. PCIe x4 tunneling provides roughly 32 Gbps effective bandwidth — enough for most inference workloads.

macOS 12.1+: Compatible with Monterey and later. Optimized for macOS 15 Sequoia.

Tom's Hardware's analysis confirmed the driver passes Apple's notarization requirements and uses standard IOKit kernel extension APIs. AppleInsider's testing found it working out-of-the-box with a Sonnet Breakaway Box 750 and RTX 4090. The community at eGPU.io has already compiled a compatibility database covering 30+ GPU and enclosure combinations.

For a deeper dive into why this matters for Nvidia's strategy, see our coverage of Nvidia DGX Spark vs. Mac Studio M4 Max.

How It Works — Architecture and Requirements

Understanding the architecture helps you set realistic expectations and choose the right hardware.

The Thunderbolt 4 Connection

Thunderbolt 4 tunnels PCIe x4 over a single cable, providing roughly 32 Gbps of effective bidirectional bandwidth. For context, a desktop PCIe 4.0 x16 slot delivers 64 Gbps. That means your eGPU gets about half the bandwidth of a native desktop connection.

In practice, this matters less than you'd think for inference. LLM inference is primarily compute-bound and memory-bandwidth-bound (how fast the GPU reads its own VRAM), not PCIe-bandwidth-bound. The model weights live on the GPU's VRAM; the only data crossing the TB4 link is token embeddings and output — kilobytes per inference step. The bottleneck shows up during model loading (transferring multi-gigabyte weights to VRAM) and large batch processing.

Supported GPUs

The TinyGPU driver supports:

Nvidia Ampere and newer: RTX 3090, RTX 3090 Ti, RTX 4090, RTX 4080 Super, RTX 5060 Ti, RTX 5080, RTX 5090, and all datacenter variants (A100, H100)

AMD RDNA3 and newer: RX 7900 XTX, RX 9070 XT (native ROCm, no Docker needed)

Older GPUs (RTX 2080, GTX series) are not supported — the driver requires Ampere+ architecture for its compute pipeline.

The Docker Requirement (Nvidia Only)

Nvidia's CUDA compilation happens inside a Docker container on macOS. This is because the CUDA toolkit's build system expects a Linux environment. The TinyGPU driver bridges the compiled CUDA kernels to the macOS kernel extension. It adds about 10 minutes to first-time setup but is transparent after that — Ollama and llama.cpp auto-detect the TinyGPU CUDA backend.

AMD GPUs don't need Docker — ROCm compiles natively on macOS through the TinyGPU driver.

Performance Expectations

Based on early benchmarks from eGPU.io and Tom's Hardware:

LLM inference (single user): 60–75% of native PCIe performance for models under 13B; 75–85% for larger models (more compute-bound)

Image generation (Stable Diffusion XL): 55–65% of native PCIe performance (more bandwidth-sensitive due to frequent weight transfers)

Fine-tuning: 50–60% of native PCIe performance (gradient sync is bandwidth-heavy)

For most local AI users doing interactive inference, you'll barely notice the TB4 overhead.

Best GPUs to Pair with Your Mac via eGPU

Here's our ranked recommendation for Mac eGPU buyers. Prices are current as of April 2026. For a broader view, see our AI GPU buying guide.

Best Overall: RTX 4090 (24 GB GDDR6X) — $1,599 – $1,999

The RTX 4090 is the best eGPU for most Mac AI users. Here's why it beats the RTX 5090 for this specific use case: 24 GB of VRAM handles up to 30B parameter models at Q4 quantization, and the TB4 bandwidth bottleneck means you won't fully exploit the 5090's extra compute anyway. You're paying $1,599 – $1,999 instead of $1,999 – $2,199, and the performance delta over TB4 is minimal.

"For eGPU setups, the 4090 is the sweet spot," noted Andrej Karpathy in his March 2026 thread on local AI hardware. "You're TB4-bottlenecked anyway — save the money unless you need 32 GB for 70B models."

The RTX 4090 delivers approximately 45–50 tok/s on Llama 3 8B (Q4) and 9–10 tok/s on Llama 3 70B (Q4) over TB4 eGPU, per LM Studio Community benchmarks. For the full desktop comparison, see our RTX 5090 vs. RTX 4090 breakdown.

Premium Pick: RTX 5090 (32 GB GDDR7) — $1,999 – $2,199

The RTX 5090 is the right choice if you plan to run 70B parameter models like Llama 4 Maverick 70B on your eGPU. Its 32 GB of GDDR7 VRAM fits 70B Q4 models entirely in GPU memory, avoiding any offloading to the Mac's unified memory. The Blackwell architecture's 5th-gen tensor cores also deliver roughly 20% better inference throughput at equivalent precision levels.

Over TB4, expect approximately 70–75 tok/s on 8B models and 13–15 tok/s on 70B Q4. The 575W TDP means you'll need a beefy eGPU enclosure — 750W minimum.

Best Value: RTX 3090 (24 GB GDDR6X) — $699 – $999

The RTX 3090 is the budget king for eGPU AI. Same 24 GB VRAM as the RTX 4090, at less than half the price on the used market. Ampere architecture is fully supported by TinyGPU. You sacrifice about 25% inference speed versus the 4090 — roughly 35–38 tok/s on 8B models and 7–8 tok/s on 70B Q4 over TB4.

For anyone building a Mac + eGPU setup on a budget, the RTX 3090 is the first card to consider. See our RTX 4090 vs. RTX 3090 comparison and used RTX 3090 vs. RTX 5060 Ti analysis for detailed value breakdowns.

Budget Entry: RTX 5060 Ti (16 GB GDDR7) — $429 – $479

The RTX 5060 Ti 16GB is the cheapest serious eGPU option for local AI. 16 GB of VRAM runs 8B–13B models comfortably and can squeeze in a heavily quantized 30B model. Blackwell architecture means great power efficiency — 150W TDP lets it run in virtually any eGPU enclosure.

Expect approximately 40–45 tok/s on 8B models over TB4. For more on this card, see our budget GPU guide.

Mid-Range: RTX 5080 (16 GB GDDR7) — $999 – $1,099

The RTX 5080 sits between the 5060 Ti and 4090. Same 16 GB VRAM as the 5060 Ti but with significantly more compute — 10,752 CUDA cores vs. the 5060 Ti's count. If you're running compute-heavy workloads like Stable Diffusion XL image generation alongside LLM inference, the 5080 is worth the premium. See RTX 5080 vs. RTX 4090 for the full comparison.

Cheapest Option: Intel Arc B580 (12 GB GDDR6) — $249 – $289

The Intel Arc B580 works via the TinyGPU driver's experimental Intel compute path. With 12 GB VRAM, it handles 7B–8B models at Q4. Performance is roughly half the RTX 5060 Ti. It's the absolute minimum viable eGPU for local AI — consider it only if budget is your primary constraint. See our Intel Arc B580 for local AI deep dive.

Quick-Reference Table

GPUVRAMPriceMax Model Size (Q4)8B tok/s (eGPU est.)Best For

RTX 509032 GB GDDR7$1,999 – $2,19970B+~70–7570B models, future-proof

RTX 409024 GB GDDR6X$1,599 – $1,99930B–70B (tight)~45–50Best overall eGPU pick

RTX 309024 GB GDDR6X$699 – $99930B–70B (tight)~35–38Budget 24 GB option

RTX 508016 GB GDDR7$999 – $1,09913B–30B (tight)~55–60Mid-range + image gen

RTX 5060 Ti16 GB GDDR7$429 – $47913B~40–45Budget entry, 8B–13B

Intel Arc B58012 GB GDDR6$249 – $2898B~20–25Absolute cheapest path

Benchmark estimates based on eGPU.io community testing and Tom's Hardware data, adjusted for TB4 bandwidth overhead. Individual results vary by model, quantization, and context length.

Best eGPU Enclosures for AI Workloads

The enclosure matters more than you think. AI GPUs draw serious power, and a weak enclosure will throttle your card.

What to Look For

Wattage: 750W for RTX 4090/5090 (450W+ GPU draw plus overhead). 550W for RTX 5060 Ti/5080.

Thunderbolt 4 certification: Ensure compatibility with Apple Silicon Macs. USB4 enclosures also work.

Internal clearance: The RTX 4090 and 5090 are 3-slot, 336mm+ cards. Measure before buying.

Cooling: Supplemental airflow matters — the RTX 5090's 575W TDP generates massive heat in an enclosed box.

Top Picks

Sonnet Breakaway Box 750eX ($349–$399): The gold standard for high-wattage eGPU enclosures. 750W internal PSU, excellent airflow, confirmed compatibility with RTX 4090 and 5090 via TinyGPU. AppleInsider used this for their review.

Razer Core X Chroma ($299–$349): 700W PSU, good thermals, USB hub for peripherals. Fits most full-length GPUs. Slightly cheaper than the Sonnet but tighter internal clearance — verify compatibility with 3-slot cards.

Budget option — Sonnet Breakaway Box 550 ($199–$249): 550W PSU. Perfect for the RTX 5060 Ti (150W) or RTX 5080 (360W). Won't power an RTX 4090 or 5090 reliably.

Which Mac to Use as Your Base

Not all Macs are equal for eGPU AI work. You need Thunderbolt 4 and enough system memory for the macOS side of the workload.

Best Value Base: Mac Mini M4 Pro — $1,399 – $1,599

The Mac Mini M4 Pro is our top recommendation as an eGPU base station. At $1,399 for the 24 GB model, it provides Thunderbolt 4 connectivity, a fast 12-core CPU for preprocessing, and 24 GB of unified memory that serves as overflow when models exceed your eGPU's VRAM. The compact form factor means your "AI workstation" is a Mac Mini + eGPU box on your desk — no tower required.

For a detailed look at the Mac Mini's standalone AI capabilities (without eGPU), see our Mac Mini M4 Pro for AI guide.

Premium Base: Mac Studio M4 Max — $1,999 – $4,499

The Mac Studio M4 Max is the premium choice for a reason: up to 128 GB of unified memory. This enables a hybrid workflow where small-to-mid models run on the eGPU for maximum speed, while very large models (70B+ at FP16) run on the Mac's own MLX backend using the massive unified memory pool. You get the flexibility to choose the right backend per model.

See RTX 5090 vs. Mac Studio M4 Max for a detailed head-to-head — now even more relevant with the eGPU bri

[truncated for AI cost control]