Case study · autonomous agent · 2026

Atlas4.

An autonomous dev agent that runs on owned M3 Ultra hardware. Local-first inference at ~70 tokens/sec. Four parallel lane workers, in-process event bus, Linear-webhook intake. No cloud-API dependency for primary work.

Renamed from atlas-direct · cloud-triggered worker model retired · current shape is a local manager + on-demand worker split with hot-swappable inference backends.

Inference throughput~70 tok/s
Active lanes4 warm
Dispatch latency~50ms
Marginal compute cost$0
§ 01 · The problem

Cloud AI is renting someone else's lever.

Every cloud-API-dependent agent has the same shape: each unit of work costs money, latency rises with token count, throughput throttles when the provider says so, and the data you process leaves your network. For a personal autonomous agent doing dozens of small jobs per day, those four constraints compound:

  • Cost scales linearly with work. A useful agent runs hot — many small requests, many tool calls, many retries. At cloud token prices, a productive day costs more than the work it offsets.
  • Latency is opaque. Provider-side queues, cold-start spikes, rate-limit backoffs — the agent can't reason about its own latency budget when the floor moves underneath it.
  • Data residency is renegotiated daily. Every request sends customer context, internal ticket bodies, and proprietary code to a third party. Defensible for one-off use; sketchy for a daemon that runs every five minutes.
  • You don't own the lever. The provider deprecates a model, changes pricing, rotates an API surface — your agent's behavior changes without you authoring the change.

The alternative — own the lever — was theoretical until very recently. The M3 Ultra running 4-bit quantized MoE models changed the economics. Atlas4 is what owning that lever looks like in practice.

§ 02 · The stack

Local-first. Fallback-aware. One machine.

Primary

Rapid-MLX · Gemma 4 26B-A4B MoE 4-bit

127.0.0.1:8081 · ~13 GB resident · ~70 tok/s on M3 Ultra · Anthropic-API-compatible endpoint

Fallback

Ollama · llama3.3:70b

127.0.0.1:11434 · auto-fall-through when Rapid-MLX is unavailable · same API contract

Embeddings

nomic-embed-text-v2-moe

via Ollama · 768-dim multilingual MoE · runs locally for retrieval + clustering

Cloud

Anthropic-direct path · opt-in, OFF by default

atlas_bot.legacy_anthropic_sdk: false · path exists for exceptional reasoning loads · not used in standard operation

Manager

Atlas4 mini-IDE · 127.0.0.1:8766

Chat surface, triage, dispatch · owns the conversation; delegates execution to lane workers via in-process event bus

Workers

Four lane workers · KeepAlive launchd

pcm · sfdc · atlas-d · research · per-lane heartbeat + inbox watcher · ~50ms dispatch latency

Intake

Linear webhook → state/inbox/<lane>/*.txt

Canonical event source · 4-hour cron poller is the safety net · Slack listening retired

Every component on the stack runs on the same Mac Studio. No SaaS dependency on the hot path. The fallback chain (Rapid-MLX → Ollama → opt-in cloud) means a single component failure degrades gracefully rather than taking the whole agent down.

§ 03 · The intake-to-execution flow

From ticket to diff.

Step 01

Ticket created in Linear

Webhook fires to com.jfstudio.atlas4hook-receiver on localhost:19801. Payload normalized, routed to the right lane based on project (REV-* → sfdc, github-pcm → pcm, etc.).

Step 02

Lane inbox file written

One file per ticket at state/inbox/<lane>/<ticket-id>.txt. watchdog-backed file watcher in the lane worker picks it up in ~50ms — 100× faster than the prior 10-second polling loop.

Step 03

Worker grounds in real reads

Worker uses python_exec_chat + anti-lie wrappers to read the actual state (SOQL queries, git diff, file inspection). No work proceeds on assumed state.

Step 04

Methodical dispatch from manager

Atlas4 (manager) composes a complete single-shot dispatch — full context, clear acceptance criteria, exact deliverable shape. Worker doesn't bounce back asking for clarification.

Step 05

Execution · autonomous on internal · approval on external

Internal-only changes (new tool code, tests, docs, config-gated-off features) ship autonomously. External writes (SF DML, HubSpot mutations, Gmail send) require Linear-comment approval before merge.

Step 06

Approval via Linear comment

Reviewer drops approve / yes / go on the ticket. Worker reads the approval signal, applies the gated change, posts the result back as a Linear comment.

Step 07

Events stream to mission-control UI

Every worker.* and job.* event publishes through the in-process event bus and SSE endpoint. The Atlas4 mini-IDE lights up in real time. Operator sees state, doesn't reconstruct it.

§ 04 · The four lanes

Parallel by design. Crash-isolated.

Phase 47 replaced a single Atlas3 catch-all worker plus 10-second inbox polling with four KeepAlive lane workers communicating with the manager over an in-process event bus. Each lane is a separate process; a crash in one doesn't take down the others.

Lane · pcm

PurpleChipMonk delivery · feature work + agent-system changes scoped to that repo

GitHub purplechipmonk · KeepAlive · per-lane logs

Lane · sfdc

Salesforce + RevOps delivery · Linear-issue intake for REV-* tickets · primary mission

Linear REV-* · KeepAlive · per-lane logs

Lane · atlas-d

Atlas-Direct self-build · feature work on the agent runtime itself

justinfowler925/atlas4 issues · KeepAlive

Lane · research

Exploration · reading new models, prototyping new tools, ad-hoc investigation

Operator-driven · KeepAlive

Each lane writes a heartbeat to state/workers/<lane>.heartbeat.json every 5 seconds. The manager subscribes; if a heartbeat goes stale, the manager surfaces the gap immediately rather than letting silence look like quiet productivity.

§ 05 · Operating principles

Locked in 2026-05-16.

Principle 01

Honest

Atlas4 always grounds in real reads. Tool calls (python_exec_chat, helpers, anti-lie wrappers) precede any reasoning about state. No claims about a system without first reading the system.

Principle 02

Fast

Optimize within the Gemma 4 + Rapid-MLX + no-prefix-cache constraint. The constraint is fixed; the work happens inside it. Dispatch latency ~50ms · single-shot dispatches · in-process events.

Principle 03

Methodical dispatch

Atlas4 (manager) composes complete, single-shot dispatches. Atlas3 (worker) executes without bouncing back. The manager front-loads the context cost so the worker can run cold.

Principle 04

Maximum autonomy

Autonomous-on-timeout default for internal work. Planning-first only on destructive operations (Salesforce DML, HubSpot mutations, mail send, money movement). The agent owns the boring, escalates the irreversible.

§ 06 · The economics

Why owning the lever matters.

Cloud-API-dependent agent

  • Variable cost per work item · rises with token count · rises with retries
  • Latency floor set by the provider · cold starts, queue depth, throttling
  • Data residency renegotiated every request
  • Behavior changes when the provider deprecates a model
  • Productive daemon costs $300–1,500/month at scale

Atlas4 on owned hardware

  • Fixed capital cost · marginal compute per work item ≈ $0
  • Latency floor is the model + the machine · operator-tunable, observable
  • Customer data stays on the local network · no third-party DPA on the hot path
  • Behavior changes only when I author the change
  • Capital cost amortizes in roughly two months vs. the cloud-API equivalent

The M3 Ultra was a one-time purchase. Electricity is a few dollars per month. The marginal cost of asking Atlas4 to do another job is effectively zero. For a daemon that runs every five minutes — and increasingly, every five seconds — that economic shape is the whole point.

"Cloud AI made it easy to start. Owning the lever made it sustainable. Local-first wasn't a privacy preference — it was a unit-economics decision that happened to also be a privacy preference."

— Operating principle · local-first inference
§ 07 · Where it's going

From warm factory to always-on operator.

Atlas4 is at Phase 47 — parallel lane workers shipped, dispatch latency cut 100×, mission-control UI lighting up in real time. The forward roadmap is shaped by three trajectories:

  • More lanes. Today's four lanes cover the primary surfaces. Customer-facing automation (a sixth lane for prospective audit work) is on the horizon — same architecture, same KeepAlive, same approval gates.
  • Better grounding. The anti-lie wrapper pattern is improving every release. Each new tool gets a "real read" companion so the agent never reasons from cached state when fresh state is cheap.
  • Wider intake. Linear is the canonical source today. GitHub issues, Slack mentions, and email-as-ticket are next — same lane-router pattern, different webhook receivers.

The North Star: an always-on operator that handles the boring parts of running a multi-system RevOps stack, escalates the interesting parts, and stays out of the way the rest of the time. The boring parts are 80% of the work; getting them autonomous releases the operator for the interesting 20%.

§ 08 · Methodology & disclosure

Private repo. Open architecture.

Atlas4 is a private repository — runtime configs, customer-specific ticket flows, and proprietary integration paths wouldn't survive open publication. What's portable is the architectural pattern: the local-first inference posture, the manager/worker split with in-process events, the lane-isolation model, the autonomy + approval-gate dichotomy.

Private

  • Repository contents, including atlas/ package, config.yaml, runtime knobs
  • Specific Linear projects, ticket flows, and webhook receivers
  • Tool implementations that touch customer Salesforce / HubSpot
  • The exact prompt strings and persona configurations

Open

  • The local-first + Anthropic-API-compatible fallback chain pattern
  • The manager (Atlas4) / worker (Atlas3) split using in-process events
  • The lane-isolation model with per-lane KeepAlive workers
  • The four operating principles (honest · fast · methodical · autonomous)
  • The intake-to-execution flow (webhook → inbox file → watchdog → dispatch → approval)

If you're building a personal autonomous agent on owned hardware: the bottleneck isn't the model anymore. It's the architecture around the model — the intake, the dispatch, the grounding, the approval gates. Atlas4's contribution is in that architecture, not in the inference.

§ 09 · Read the source

Architecture on request.

The Atlas4 source repository is private. The architecture, the operating principles, and the lessons that justified each design decision are described here and are free for any operator to adapt to their own setup.

If you're building something similar — reach out. I'm happy to walk through the design decisions and the failure modes that shaped each one.