The Platform Fix Hello Reader— Three weeks ago, I sat with a VP Engineering who'd spent 14 months implementing "modern observability." Datadog for metrics. New Relic for APM. Splunk for logs. Sentry for errors. Grafana for visualisation. Honeycomb for traces. PagerDuty for incidents. Seven tools. £200,000+ annual spend. And when their checkout service went down last Tuesday at 2 AM, they still couldn't find the problem for 6 hours. Why? Because they'd implemented observability backwards. They...
about 2 months ago • 2 min read
The Platform Fix Hello Reader— Your platform dashboard is green across the board. 99.99% uptime. Sub-100ms response times. Zero critical alerts in production. Your executive dashboard looks beautiful. Your SRE team is celebrating. Meanwhile, your developers can't deploy a two-line code change without three days of platform team help. Recognise this pattern? I've seen it at 50+ companies. Perfect metrics, broken developer experience. Book a 30-minute discovery call and I'll show you the one...
about 2 months ago • 2 min read
The Platform Fix Hello Reader— Your lead platform engineer books a two-week holiday. Your first thought: "Can we survive without them?" If the answer is "I don't know" or "probably not," you have what I call a Bus Factor of 1. One person gets hit by a bus (or finds a better job), and your platform collapses. I've tested this pattern across 47 enterprise platforms. 41 had a Bus Factor of 1 or 2. Most didn't realise it until that critical person resigned. The Bus Factor Reality Check Here are...
2 months ago • 3 min read
The Platform Fix | Issue #013 Hello Reader— Remember that fintech where 3 engineers quit? We found something else. One K8s feature burning £67k/month. They’d enabled it because “everyone does.” Sound familiar? The Cluster Autoscaler Conspiracy: Here’s what vendors don’t tell you about autoscaling: 73% of clusters scale for peaks that happen <1% of the time Average utilisation: 23% You’re paying for 77% air The math is brutal: 100 nodes provisioned. 23 actually needed. £50k+/month up in smoke....
2 months ago • 1 min read
The Platform Fix | Issue #013 Hello Reader— Last week, three platform engineers handed in their notice at a UK fintech. All on the same day. The CEO called me in a panic. "Steve, they're our best people. What the hell happened?" I'll tell you what happened. And it's happening at your company too. The Brutal Truth: Your platform engineers aren't engineers anymore. They're YAML therapists. The numbers don't lie: 74% of their time on "keeping lights on" Average 3am wake-up calls per week: 4...
3 months ago • 1 min read
The Platform Fix | Issue #012 Hello Reader— Just got back from Hamburg. 30 platform conversations. 3 interventions. But one haunts me… A CTO pulled me aside: “Steve, we’re already falling. How do we land without dying?” Pinterest called their K8s failure “one-in-a-million.” I’ve seen that exact pattern 47 times. This year alone. Every Doomed Migration Follows Three Stages: Stage 1: Honeymoon (Months 1-3) “Ahead of schedule!” Team excited. Everything simple. Stage 2: Struggle (Months 4-9)...
3 months ago • 1 min read
The Platform Fix | Issue #011 Hello Reader— I'm writing this from my hotel in Hamburg. In 3 hours, I'll be on stage at Bit Summit talking about platform simplification. Deutsche Bank and ING will be in the front row. But here's what I won't mention in my talk: "Steve, we need Istio. Everyone's using service mesh." Those words cost a UK retail bank £2M and 18 months. Last Tuesday, we ripped it all out. Service Mesh Is Like Insurance: It sounds responsible until you read the fine print. What...
3 months ago • 2 min read
The Platform Fix Hello Reader— Last week, I reviewed a platform with 47 monitoring tools. 47. That’s not architecture. That’s hoarding with a YAML addiction. Today, I’m done being polite about platform complexity. Main Teaching: I’ve just published something that might get me uninvited from a few conferences: The Pragmatic CNCF Manifesto. After 50+ migrations and £100M+ in complexity eliminated, I’ve written the guide I wish existed when I started. The one vendors don’t want you to read. The...
3 months ago • 1 min read
The Platform Fix | Issue #009 Hello Reader— “We’re shutting down the platform team.” The Slack channel went silent. 15 engineers. £4.2M annual budget. Gone. But here’s the twist: It was their idea. Six months later, deployment frequency increased 400%. Developer satisfaction hit 9.2/10. Platform costs dropped £3M annually. The platform team didn’t get fired. They got promoted to “Product Engineering” and became the most valuable team in the company. Here’s how they did it - and why your...
3 months ago • 4 min read