Real hardware · Real tasks · Honest verdicts

Your data doesn't
leave this machine.

Weekly experiments on Apple Silicon. Each one targets a specific task for a specific regulated industry and returns a verdict you can actually act on, not "promising" or "has potential," but viable, partial, or not yet.

Industry: Verdict:

3 experiments

Week 3 Viable

Can a local model turn client meeting voice memos into CRM-ready notes for an RIA?

A base Mac mini running a local 8B model converted 12 de-identified client meeting transcripts into structured CRM notes. The model captured action items and follow-ups accurately and produced clean structured output, making it a strong fit for the RIA's most time-consuming administrative task.

financial m4‑mini document‑drafting May 16, 2026
Week 2 Partial

Can a local 13B model flag risky clauses in a vendor contract for a solo attorney?

A local Llama 3.1 13B model reviewed five commercial vendor contracts and flagged clauses for attorney attention. It caught most obvious red flags but missed nuanced jurisdiction-specific risks, making it useful as a first-pass triage tool, not a replacement for legal review.

legal m4‑pro document‑review May 9, 2026
Week 1 Viable

Can an $800 Mac Mini draft SOAP notes for a solo medical practice?

A base M4 Mac mini running a quantized Llama 3.1 8B model can draft structured SOAP notes from rough voice transcripts at a speed and quality that makes editing faster than writing from scratch, for most visit types.

medical m4‑mini document‑drafting May 4, 2026

FAQ

About the experiments

What does Ground Floor test each week?
Each Ground Floor experiment runs one real task for a regulated practice on a local, open-weight model: a SOAP note, a contract clause, a set of meeting minutes. It runs on Apple Silicon with the air-gap held. I publish what it did, the model and hardware used, and a plain verdict of viable, partial, or not yet.
Are the failures published too?
Yes. An experiment gets a "not yet" verdict when the local model could not do the task well enough to trust, and that verdict stays up. The point is an honest record of what local AI can and cannot do for regulated work right now, not a highlight reel.
What hardware and models do the experiments use?
The experiments run open-weight models on Apple Silicon Macs, from an M4 mini up to the two-M5-Max lab. Each write-up lists the exact model, the Mac, and the task, so you can judge whether it maps to your own setup.