The hardware

Pick your tier.
Run it locally.

Pick a tier, select a config, and you'll see exactly what runs on it and what it costs. Everything here is open-weight, no API keys, no vendor agreement, no data leaving your machine.

Ground Floor Lab · what these experiments run on

MacBook Pro M5 Max · 128 GB · 8 TB · × 2 nodes

My actual setup. Connected over TB5 RDMA at ~3 µs. When an experiment is documented as "viable at entry level," I ran it throttled to that tier, not on this hardware.

~$21K to build today · 3-year AppleCare+ on both nodes · Thunderbolt 5 cables, all open-source, fully air-gapped

256 GB

Combined memory

16 TB

Combined storage

~800 GB/s

Peak bandwidth

~3 µs

TB5 RDMA latency

Entry · Mac mini

Mac mini M4

From

$799

Mid-range · Mac mini

Mac mini M4 Pro

From

$1,599

Base · MacBook Pro

MacBook Pro M5

From

$1,999

Entry portable · MacBook Air

MacBook Air M5

From

$1,299

Mid-range portable · MacBook Pro

MacBook Pro M5 Pro

From

$2,499

Workstation · MacBook Pro · Ground Floor Lab

MacBook Pro M5 Max

From

$4,099

14-inch from $4,099 · 16-inch from $4,399 · All prices are Apple US list price before tax.

Why Apple Silicon

The old constraint: fit the entire model in VRAM. A 16 GB consumer GPU can't load a Q4-quantized 13B model. It just doesn't fit. For a long time that felt like the ceiling. Your options were a $10K+ workstation GPU or a cloud API, and most people quietly chose the API.

Apple Silicon changed this in a way I don't think has been fully absorbed. GPU and CPU share one memory pool. A 16 GB Mac mini loads a Q4 8B model (~5 GB) with room left for the OS and other apps. There's no VRAM ceiling.

Every model listed here is open-weight, publicly released weights, no API keys, no telemetry, no vendor agreement. The weights you download today run the same way in five years. Air-gapped by default.

The real cost question

I get this question a lot: does local pay for itself in API savings? Honestly, not on token costs alone. At $0.01 per interaction, you're looking at years before hardware pencils out. That's not the argument and I won't pretend it is.

The real cost is what I'd call compliance overhead: the vendor due diligence, the BAA negotiation, the conversation with your malpractice carrier about someone else's breach notification process. A local machine removes the third party entirely, not cheaper per token, but simpler in a way that actually matters to a solo practice.

There's also a cost nobody thinks about until they see the invoice: runaway API calls. Cloud models are metered at both ends: GPT-5.5 runs $5 per million input tokens, $30 per million output. GPT-5.5 Pro is $30 in, $180 out. Most of the time that's manageable. But agentic workflows that loop, integration bugs, or a script left running overnight don't stop billing because something went wrong. The meter runs regardless. A $20/month habit becomes a $1,200 charge before anyone notices. With local inference, there is no meter. The model runs or it doesn't. Nothing accumulates.

And then there's the subscription stack. Most people I talk to aren't paying for one AI service. They're paying for two or three. ChatGPT Plus at $20/month, Claude Pro at $20/month, maybe Copilot on top of that. If you upgrade to ChatGPT Pro it's $200/month. Claude Max starts at $100. Nobody sat down and decided to spend $240 a year on AI. It accumulated one reasonable-seeming signup at a time. One piece of local hardware eliminates all of it, permanently, for every person in the practice.

The hardware also has a 4–6 year useful life and does everything else a laptop or desktop does. It's not a pure AI line item. Really it's infrastructure.

Break-even analysis

Local vs. subscription

One-time hardware cost vs. monthly AI subscription, for a single practitioner. Break-even is when total subscription cost passes the hardware cost.

Hardware One-time cost vs. ChatGPT Pro ($200/mo) vs. Claude Max ($100/mo)
Mac mini M4 16 GB $799 4 months 8 months
Mac mini M4 24 GB $1,199 6 months 12 months
MacBook Air M5 24 GB $1,499 7 months 15 months
MacBook Pro M5 Pro 48 GB $3,599 18 months 36 months

Break-even months = hardware price ÷ monthly subscription cost. After break-even the local machine costs nothing per month, and your data has never left the building. The compliance overhead savings (vendor due diligence, BAA negotiations, malpractice conversations) aren't reflected here. They're harder to quantify but often exceed the hardware cost for regulated practices.

Lab roadmap

Scaling to 3 TB unified memory

The current two-node setup is a step on a longer path. I'm building toward 3 TB of unified memory, enough to run full-precision inference on the largest open-weight models that exist. The goal isn't scale for its own sake. It's running the best possible models without asking a vendor for permission.

1 Current · Active
Known cost

256 GB cluster

2× M5 Max 128 GB / 8 TB · TB5 RDMA

~$21,000

2× MBP M5 Max 128 GB / 8 TB $20,000
2× AppleCare+ (3-yr) $900
TB5 cables ~$100
· 244 GB usable (12 GB Metal overhead)
· DeepSeek-R1 671B at Q2_K
· Qwen3.5 235B A22B at Q8
2 Next · awaiting M5 Ultra release
Est. on release

3 TB cluster

4× M5 Ultra Mac Studio 768 GB

~$130–190K

4× M5 Ultra 768 GB Mac Studio ~$32–48K ea.*
4× AppleCare+ ~$1.2K
TB5 cables + hubs ~$1K
· 2 machines × 768 GB = 1.5 TB per node
· 2 nodes · TB5 RDMA · ~2.95 TB usable
· FP16 inference on 671B+ models, with headroom
3 Goal · 3 TB cluster live
Est. on release

3 TB unified memory

4× M5 Ultra 768 GB · fully air-gapped

~$130–190K

4× M5 Ultra 768 GB · the rig ~$130–190K
vs. H100 80 GB × ~38 (equiv. memory) >$1.1M
· FP16 inference on 671B+ parameter models
· Frontier PhD-tier · fully air-gapped
· No cloud · no API · no vendor lock-in

* M5 Ultra Mac Studio is unannounced. A June 2026 Bloomberg report says Apple has tested support for up to 768 GB of unified memory, with a possible launch around October 2026. Pricing is speculative and runs high because of the 2026 DRAM shortage. Apple's shipping M3 Ultra Mac Studio starts at 96 GB for $3,999, and as of 2026 it has pulled the 256 GB and 512 GB memory tiers for supply reasons. Scaling that blended cost ($3,999 / 96 GB, about $42 per GB) to 768 GB lands near $32K a machine before any halo-tier premium, so a built four-machine cluster realistically runs $130–190K, and could approach $300K if Apple prices the top SKU punitively. Actual prices and availability will differ. The H100 comparison is for equivalent unified memory capacity at datacenter GPU list pricing, not a performance comparison. Experiments at each tier will be documented here as they run.