The hardware
Pick your tier.
Run it locally.
Pick a tier, select a config, and you'll see exactly what runs on it and what it costs. Everything here is open-weight, no API keys, no vendor agreement, no data leaving your machine.
Ground Floor Lab · what these experiments run on
MacBook Pro M5 Max · 128 GB · 8 TB · × 2 nodes
My actual setup. Connected over TB5 RDMA at ~3 µs. When an experiment is documented as "viable at entry level," I ran it throttled to that tier, not on this hardware.
~$21K to build today · 3-year AppleCare+ on both nodes · Thunderbolt 5 cables, all open-source, fully air-gapped
256 GB
Combined memory
16 TB
Combined storage
~800 GB/s
Peak bandwidth
~3 µs
TB5 RDMA latency
Entry · Mac mini
Mac mini M4
From
$799
Mid-range · Mac mini
Mac mini M4 Pro
From
$1,599
Base · MacBook Pro
MacBook Pro M5
From
$1,999
Entry portable · MacBook Air
MacBook Air M5
From
$1,299
Mid-range portable · MacBook Pro
MacBook Pro M5 Pro
From
$2,499
Workstation · MacBook Pro · Ground Floor Lab
MacBook Pro M5 Max
From
$4,099
14-inch from $4,099 · 16-inch from $4,399 · All prices are Apple US list price before tax.
Why Apple Silicon
The old constraint: fit the entire model in VRAM. A 16 GB consumer GPU can't load a Q4-quantized 13B model. It just doesn't fit. For a long time that felt like the ceiling. Your options were a $10K+ workstation GPU or a cloud API, and most people quietly chose the API.
Apple Silicon changed this in a way I don't think has been fully absorbed. GPU and CPU share one memory pool. A 16 GB Mac mini loads a Q4 8B model (~5 GB) with room left for the OS and other apps. There's no VRAM ceiling.
Every model listed here is open-weight, publicly released weights, no API keys, no telemetry, no vendor agreement. The weights you download today run the same way in five years. Air-gapped by default.
The real cost question
I get this question a lot: does local pay for itself in API savings? Honestly, not on token costs alone. At $0.01 per interaction, you're looking at years before hardware pencils out. That's not the argument and I won't pretend it is.
The real cost is what I'd call compliance overhead: the vendor due diligence, the BAA negotiation, the conversation with your malpractice carrier about someone else's breach notification process. A local machine removes the third party entirely, not cheaper per token, but simpler in a way that actually matters to a solo practice.
There's also a cost nobody thinks about until they see the invoice: runaway API calls. Cloud models are metered at both ends: GPT-5.5 runs $5 per million input tokens, $30 per million output. GPT-5.5 Pro is $30 in, $180 out. Most of the time that's manageable. But agentic workflows that loop, integration bugs, or a script left running overnight don't stop billing because something went wrong. The meter runs regardless. A $20/month habit becomes a $1,200 charge before anyone notices. With local inference, there is no meter. The model runs or it doesn't. Nothing accumulates.
And then there's the subscription stack. Most people I talk to aren't paying for one AI service. They're paying for two or three. ChatGPT Plus at $20/month, Claude Pro at $20/month, maybe Copilot on top of that. If you upgrade to ChatGPT Pro it's $200/month. Claude Max starts at $100. Nobody sat down and decided to spend $240 a year on AI. It accumulated one reasonable-seeming signup at a time. One piece of local hardware eliminates all of it, permanently, for every person in the practice.
The hardware also has a 4–6 year useful life and does everything else a laptop or desktop does. It's not a pure AI line item. Really it's infrastructure.
Break-even analysis
Local vs. subscription
One-time hardware cost vs. monthly AI subscription, for a single practitioner. Break-even is when total subscription cost passes the hardware cost.
| Hardware | One-time cost | vs. ChatGPT Pro ($200/mo) | vs. Claude Max ($100/mo) |
|---|---|---|---|
| Mac mini M4 16 GB | $799 | 4 months | 8 months |
| Mac mini M4 24 GB | $1,199 | 6 months | 12 months |
| MacBook Air M5 24 GB | $1,499 | 7 months | 15 months |
| MacBook Pro M5 Pro 48 GB | $3,599 | 18 months | 36 months |
Break-even months = hardware price ÷ monthly subscription cost. After break-even the local machine costs nothing per month, and your data has never left the building. The compliance overhead savings (vendor due diligence, BAA negotiations, malpractice conversations) aren't reflected here. They're harder to quantify but often exceed the hardware cost for regulated practices.
Lab roadmap
Scaling to 3 TB unified memory
The current two-node setup is a step on a longer path. I'm building toward 3 TB of unified memory, enough to run full-precision inference on the largest open-weight models that exist. The goal isn't scale for its own sake. It's running the best possible models without asking a vendor for permission.
256 GB cluster
2× M5 Max 128 GB / 8 TB · TB5 RDMA
~$21,000
3 TB cluster
4× M5 Ultra Mac Studio 768 GB
~$130–190K
3 TB unified memory
4× M5 Ultra 768 GB · fully air-gapped
~$130–190K
* M5 Ultra Mac Studio is unannounced. A June 2026 Bloomberg report says Apple has tested support for up to 768 GB of unified memory, with a possible launch around October 2026. Pricing is speculative and runs high because of the 2026 DRAM shortage. Apple's shipping M3 Ultra Mac Studio starts at 96 GB for $3,999, and as of 2026 it has pulled the 256 GB and 512 GB memory tiers for supply reasons. Scaling that blended cost ($3,999 / 96 GB, about $42 per GB) to 768 GB lands near $32K a machine before any halo-tier premium, so a built four-machine cluster realistically runs $130–190K, and could approach $300K if Apple prices the top SKU punitively. Actual prices and availability will differ. The H100 comparison is for equivalent unified memory capacity at datacenter GPU list pricing, not a performance comparison. Experiments at each tier will be documented here as they run.