The Platform Fix | Issue #008 Hello Reader— At 3am last Tuesday, alerts started screaming. “CRITICAL: CPU usage at 95%! Memory at 87%! Disk I/O spiking!” The platform team scrambled. Emergency scaling. Incident calls. War room activated. Six hours later, they discovered the truth: Every single alert was meaningless. The platform was handling traffic perfectly. Users were happy. Revenue was flowing. They’d spent £2M building dashboards that tracked everything except what actually mattered....
21 days ago • 4 min read
The Platform Fix | Issue #007 Hello Reader— One Monday, I got the call every CTO dreads. “Steve, our 10X engineer just quit. The platform is completely down. We can’t deploy anything. The board is asking if we should shut down the entire engineering division.” Three years. £10M invested. One person held it all together. When he left, everything collapsed in 72 hours. Here’s the uncomfortable truth: Your platform heroes aren’t saving you. They’re slowly killing your business. THE £500K HERO...
28 days ago • 3 min read
The Platform Fix | Issue #006 At 4am, James was a hero. Again. He’d fixed production. Saved the company. Everyone would thank him on Monday. Six months later, James burned out and quit. The platform collapsed within a week. Your heroes aren’t saving your platform. They’re hiding its failures. THE HERO PARADOX™ Every failing platform has the same story: One brilliant engineer holds it all together. They know every system. Fix every issue. Answer every question. Everyone says: “Thank god for...
about 1 month ago • 6 min read
The Platform Fix | Issue #005 “We’re firing Kubernetes.” The room went silent. This was heresy at a “modern” tech company. But their CEO had done the math: £500K/year for 15 microservices. 8 developers. 2 million users. They switched to boring technology. Costs dropped 80%. Velocity doubled. Here’s why they were right. THE KUBERNETES INDUSTRIAL COMPLEX Let’s talk about the elephant in every platform engineering room: Most of you don’t need Kubernetes. There. I said it. After analysing...
about 1 month ago • 4 min read
The Platform Fix | Issue #004 Last Tuesday, a CTO called me in tears. Good tears. “Steve, we just deleted 50% of our platform. Everything runs better. Our AWS bill dropped £33K/month. Developers are actually… happy?” The secret? We used pink post-it notes. Here’s exactly what we did. THE GREAT PLATFORM PURGE OF 2024 Six months ago, this same CTO was drowning: 47 microservices (for 12 developers) 8 different ways to deploy 3 competing CI/CD systems 2 service meshes (yes, two) 0 documentation...
about 2 months ago • 5 min read
The Platform Fix | Issue #003 Three weeks ago, I sat in a retrospective where a senior developer finally snapped. “Your platform is like a prison. Every simple task requires five approvals, three tickets, and a blood sacrifice to the YAML gods.” The platform team was stunned. They’d spent two years building “developer-friendly” abstractions. Plot twist: Not one developer had been involved in the design. THE PLATFORM PERCEPTION GAP™ After interviewing 500+ developers about their platform...
about 2 months ago • 5 min read
The Platform Fix | Issue #002 At 2am on a Tuesday, my phone rang. “Steve, our service mesh is down. Everything’s broken. The entire platform team is panicking.” I asked one question: “What was it actually doing for you?” Silence. After 10 minutes of investigation, we discovered their £2M service mesh was handling… basic load balancing. Something their existing ingress controller already did. They’d spent 18 months implementing a solution to a problem they never had. HOW I LEARNED THIS THE...
2 months ago • 5 min read
The Platform Fix | Issue #001 Hello Reader— One Thursday, I walked into a boardroom full of panicked executives. "Steve, we're 18 months into our Kubernetes migration. £3M over budget. Zero apps in production. The board wants to kill it." Sound familiar? Here's the uncomfortable truth: Their migration was dead 12 months ago. They just didn't know it yet. After rescuing 50+ enterprise migrations, I've discovered that most fail in the first 90 days. Not because of technology - but because teams...
2 months ago • 2 min read