Ground Floor · Models

Air-gapped AI
for regulated practices.

Open-weight models running on hardware you own. No API calls. Zero vendor agreements. The data that goes in stays in.

RAM color guide

≤ 36 GB · any Mac≤ 128 GB · M5 Max≤ 244 GB · cluster> 244 GB · bigger cluster
Open-weight leaderboard · 24 models
Model Intel Min RAM t/s
GLM-5.2 RDMA 51 150 GB 132
MiniMax M3 RDMA 44 160 GB 92
Kimi K2.6 43 360 GB 83
DeepSeek V4 Flash 40 135 GB 104
Qwen3.6 27B 37 17 GB 57
Qwen3.5 9B 25 6 GB 56
Gemma 4 E4B 12 5 GB 290
Phi-4 14B 5 10 GB 36
Showing 8 of 24 · sorted by intelligence Full table →
M5 Max × 2 nodes256 GB unified memory16 TB storageTB5 RDMA · 3 µs latency100% on-premise

Why local

The cloud API is the weakest link.

Something I kept noticing: the practitioners who most need good AI tools are often the ones most exposed by cloud architecture. Every API call routes data somewhere else. For patient records, client files, and financial accounts, that's not a quirk. That's the whole problem. Air-gapped removes it from the equation entirely.

01

No API calls

The model runs on the machine in front of you. No network call leaves, no vendor handles the data, no API logs it. Inference happens locally, that's the whole point.

02

No breach surface

Patient transcripts, client documents, financial records, none of it appears in a vendor's incident report. It never left your building.

03

No lock-in

Open weights on open hardware. Swap the model, change the runtime, upgrade the machine, without renegotiating anything.

The hardware

The Lab

I have two M5 Max MacBook Pros connected over Thunderbolt 5 RDMA, 256 GB combined memory, 16 TB of storage, ~3 µs between nodes. It's an unreasonably powerful home office setup. But it means every experiment documented here has a specific, real hardware answer.

Node A

Chip
M5 Max
Memory
128 GB
Storage
8 TB NVMe
OS
Tahoe 26.2

Thunderbolt 5 · RDMA · ~3 µs

Node B

Chip
M5 Max
Memory
128 GB
Storage
8 TB NVMe
OS
Tahoe 26.2

Combined cluster

256 GB

Unified memory

16 TB

Total storage

~3 µs

TB5 RDMA latency

~800 GB/s

Peak bandwidth

Weekly

Latest experiments

View all →
Week 3 Viable

Can a local model turn client meeting voice memos into CRM-ready notes for an RIA?

A base Mac mini running a local 8B model converted 12 de-identified client meeting transcripts into structured CRM notes. The model captured action items and follow-ups accurately and produced clean structured output, making it a strong fit for the RIA's most time-consuming administrative task.

financial m4‑mini document‑drafting May 16, 2026
Week 2 Partial

Can a local 13B model flag risky clauses in a vendor contract for a solo attorney?

A local Llama 3.1 13B model reviewed five commercial vendor contracts and flagged clauses for attorney attention. It caught most obvious red flags but missed nuanced jurisdiction-specific risks, making it useful as a first-pass triage tool, not a replacement for legal review.

legal m4‑pro document‑review May 9, 2026
Week 1 Viable

Can an $800 Mac Mini draft SOAP notes for a solo medical practice?

A base M4 Mac mini running a quantized Llama 3.1 8B model can draft structured SOAP notes from rough voice transcripts at a speed and quality that makes editing faster than writing from scratch, for most visit types.

medical m4‑mini document‑drafting May 4, 2026

Regulated verticals

By industry

All industries →

FAQ

Common questions

What is Ground Floor?
Ground Floor is an independent project documenting air-gapped, open-weight AI for regulated practices: legal, medical, financial, and accounting. The models run on hardware you own. There are no API calls and no vendor agreements, so the data that goes in stays in. Every week I run a real experiment and publish the honest verdict, including the ones that are not ready yet.
Can you run a large language model locally without sending data to the cloud?
Yes. Ground Floor runs open-weight models entirely on your own machine, so there is no API call and no copy of your data on a vendor's server. For a clinic or a law firm, that is the point: patient records and privileged files never leave the building. The trade is that you run your own hardware instead of renting someone else's.
What hardware do you need to run a capable local model?
Less than most people expect. Small open-weight models run on any modern Mac, and mid-size ones want roughly 32 to 128 GB of memory. The Ground Floor lab uses two M5 Max MacBook Pros with 256 GB of combined memory, but that is deliberately overbuilt so I can test the ceiling. Use the Will-It-Run tool to get the honest answer for your specific Mac.
Is a local model actually fast enough to be useful?
In my Ground Floor testing, yes, for the work regulated practices actually do: drafting notes, summarizing documents, checking a contract clause. An M5 Max runs a 70B-parameter model faster than most cloud API round-trips, on hardware you own. It will not beat a frontier cloud model on the hardest reasoning, and I say so on each experiment. For everyday drafting, the gap that matters has closed.
Open-weight local model or a cloud API for regulated data?
For data that cannot leave the building, such as PHI, privileged legal files, or financial records, a local open-weight model like Ground Floor avoids the thing a cloud API cannot: your data leaving your control and sitting with a third party. A cloud API is often more capable on raw reasoning, so the real question is whether the data is allowed to leave at all. If it is not, local is the honest answer. And running locally does not make you HIPAA-compliant by itself; it removes one large exposure, not all of them.

June 2026

The question is answered.

An M5 Max Mac runs a 70B parameter model faster than most cloud API round-trips. On hardware you own. With data that never leaves. I wasn't certain this was practical two years ago. It is now.