The “Hybrid” AI Stack: Enterprise Power at Work, Localhost Freedom at Home 🌗
originally posted at LinkedIn at November 26, 2025
Why I stopped using free cloud tiers and moved my personal AI stack to the M4 Pro
As a DevOps engineer, I live in two worlds.
During the workday, I rely on my company-provided subscription to Cursor. It’s incredible—context-aware, deeply integrated into our enterprise codebase, and fast. When I’m collaborating with a team on a massive repo, I need that Cloud Context. It indexes thousands of files and understands our specific enterprise architecture. It is worth every penny my company pays.

But for my personal projects, experiments, and technical blogging, I wanted something different.
I was tired of the freemium trap of online AI tools:
- “You have 2 credits remaining today.”
- “Upgrade to Pro to upload large PDFs.”
- “Your data may be used to train our models.”
So, I built a secondary Personal AI Stack on my MacBook Pro M4 (24GB). It costs $0, has zero privacy leaks, and runs entirely on my own metal.
Here is exactly how I separate the two worlds—and why the local stack beats the “free tier” cloud every time.
1. The Engine: Ollama 🦙
The Problem: Running raw models locally used to require Python hell.
The Fix: I use Ollama to turn my Mac into a private inference server.
My “weekend” command:
ollama run qwen2.5-coder:14b
Benefit: I can hammer the API with 500 requests an hour while debugging a script, and it costs me nothing. No rate limits. No “Plus” subscription required.
2. The Interface: Open WebUI 🖥️
The Problem: Most local AI tools are just a command line. I missed the “ChatGPT” experience (history, formatting, file uploads).
The Fix: I run Open WebUI (via Docker). It is effectively a clone of ChatGPT Enterprise, but I own the server.
It unlocks three “Pro” features that usually cost $20/month in the cloud:
- Local RAG (Retrieval): I can drop a 50-page PDF of my bank statement or health records into the chat. The model analyzes it locally without the file ever leaving my LAN.
- Web Search: By toggling “Web Search” on, my local models can browse the internet (via DuckDuckGo or Google) to get real-time data, fixing the “knowledge cutoff” problem of offline models.
- Data Privacy: My chat history is stored in a Docker volume on my Mac, not on OpenAI’s servers.
3. The Studio: Draw Things 🎨 (The Real Game Changer)
This is where the difference between “Cloud Free” and “Local Power” is most obvious.
For my blog headers and UI mockups, I use Draw Things, an optimized iOS/macOS app that runs Flux.1 models locally.
Why “Local” crushes the “Free Online” generators
| Feature | Typical Free Online Generator (DeepAI, etc.) | Draw Things on M4 Pro |
|---|---|---|
| Daily Limit | ~5–10 images/day (then you pay) | Infinite. Run it 24/7 if you want. |
| Resolution | Low (512px). HD requires subscription. | Unlimited. Generate 4K images (limited only by RAM). |
| Privacy | Public. Your prompts are often visible to the community. | Private. Offline. Zero data leaves your Mac. |
| Speed | Queue times. “Waiting for server…” | ~15 s per image (Flux Schnell). |
| Censorship | Strict. “Prompt blocked due to safety filters.” | None. You control the model. |
The “Hardware Reality” Check (24GB RAM) 📊
Since my personal machine has 24GB of RAM, I have to be strategic. I can’t run the 405B models I use at work. Here is my “Personal Lab” tier list to maximize performance without crashing the system:
| Model | Use Case | RAM Usage (Est.) | Speed on M4 Pro | Verdict |
|---|---|---|---|---|
| Qwen 2.5 Coder (14B) | Coding / Scripting | ~9 GB | Blazing Fast ⚡ | The perfect balance. Smart enough for React/Java side projects, small enough to run alongside Docker. |
| Llama 3.1 (8B) | Drafting / Email | ~5 GB | Instant 🚀 | My go-to for drafting LinkedIn posts or brainstorming ideas quickly. |
| Qwen 3 Coder (30B) | Complex Architecture | ~19 GB | Heavy 🐢 | The “Danger Zone.” It fits, but I have to close Chrome tabs. I only use this for deeply complex problems. |
| Flux.1 Schnell (8-bit) | Mockups / Images | ~12 GB | ~15 sec/img | Fast and efficient. I use this for 90% of my blog assets. |
| Flux.1 Dev (4-bit) | Photorealism | ~16 GB | ~45 sec/img | Slower, but produces “Midjourney-level” skin texture and lighting. |
The Verdict
I don’t see local AI as a replacement for my professional enterprise tools; I see it as the ultimate Developer Sandbox.
At work, I want the power of the cloud. At home, I want the privacy, freedom, and unlimited tokens of localhost.
Does anyone else run a “Hybrid Stack”? What models are you running on your Mac?