Last month I shipped a product update to my phone and got a white screen. Not a crash. Not an error message. A white screen. The app compiled clean, the deployment succeeded, the CI passed. Everything looked perfect in the logs. But when I opened it on my actual device, there was nothing there. The model that wrote the code was GPT-5.4. It had generated a full SwiftUI view with correct syntax, valid state management, and no compiler warnings. The code was technically flawless. It just didn't render anything visible. And because my pipeline at the time had no step between "code compiles" and "ship to device," it went straight to production. I found out the same way a user would - by opening the app and staring at nothing. That was the moment I stopped thinking about whether AI-generated code is reliable and started thinking about whether my system for catching bad code is reliable.
The vibe coding backlash hit peak volume this week. The numbers are real: AI-assisted code has a 1.7x higher bug rate than human-written code. Over 1.5 million API keys have been found exposed in AI-generated code pushed to production. AWS says 40% of code on their platform is now AI-generated. The generation is scaling. The review infrastructure is not. But the conversation has collapsed into two camps. One says vibe coding is the future and anyone who questions it is a dinosaur. The other says AI-generated code is inherently dangerous and we should all slow down. Both are wrong because both are arguing about the code itself. The code is not the problem. The system around the code - or the absence of one - is the problem.
My white screen wasn't a failure of GPT-5.4. It was a failure of my pipeline. The model did exactly what I asked it to do. I just didn't have anything in place to verify that what it produced actually worked on a real screen. That's not a model problem. That's an engineering problem. And it's entirely mine.
What I built after the white screen
The fix wasn't to stop using AI. The fix was to build the infrastructure that AI-generated code requires. I spent the next few weeks rebuilding my entire build pipeline around one principle: nothing ships without being verified at every stage by something other than the model that wrote it. First, I broke every product build into phases. An MVP phase ships core functionality and nothing else. Then enhancement packs add features one layer at a time. Each phase gets human testing before the next one starts. Not automated testing only - a real person opening the app on a real device and using it. The white screen would have been caught in the first 30 seconds of a human test. It took me weeks of shipping broken things to learn that automated tests and real-device testing solve different problems, and you need both.
Second, I built content scoring into the pipeline. Every piece of content my system produces - blog posts, research reports, product copy - gets scored across six dimensions before it can be published. The threshold is 7 out of 10. Below that, it gets rewritten or killed. Before the scoring gate existed, I was publishing AI-generated content that was technically correct but genuinely boring. Accurate and forgettable. Third, I added a product-level audit that scores distribution readiness on a 100-point scale. A product needs to hit 60 across onboarding flow, time-to-value, freemium mechanics, and viral loops before it enters the distribution system. Before this gate existed, I was trying to market products that weren't ready to be marketed. The audit catches that before I waste effort pushing something nobody would stick with.
None of this is complicated. All of it requires admitting that the model output - whether it's code, content, or product decisions - is a first draft, not a finished product. The infrastructure that reviews, tests, and gates that output is what determines whether it ships or breaks.
Infrastructure over vibes
Here's a specific example of what infrastructure looks like in practice. I built a model router that handles 94 different task types across 8 models on 4 platforms. Every task in my system - from a simple config file to a complex multi-file architecture - gets routed to the right model based on what the task actually demands. A background status check goes to a local model running on my machine. Free. No tokens burned. A research synthesis that requires genuine reasoning goes to Opus. A mechanical code generation task goes to Codex. The routing is automatic. I don't think about which model to use for what. The system handles it based on cognitive demand, burn rate, and task classification.
Most people using AI open whatever model is in their browser tab and throw everything at it. Complex architecture? ChatGPT. Simple formatting? ChatGPT. Security review? ChatGPT. Same model, same approach, regardless of what the work actually needs. That's vibe coding in its purest form. Not because they're using AI - but because there's no system deciding how to use it. Building infrastructure around AI usage sounds less exciting than "I built an entire app in 20 minutes." But the app built in 20 minutes is the one that ships a white screen to your phone. The infrastructure is what makes the 50th app work as well as the first one did.
AWS reported that 40% of code suggestions on their platform now go to commit. That number is going up, not down. Google internally reports similar figures. The volume of AI-generated code being written is no longer a question. The question is who reviews it. A model can produce 2,000 lines of working code in minutes. A human reviewer needs hours to verify that those 2,000 lines do what they're supposed to do, handle edge cases, don't leak credentials, and actually render something on screen. The generation scales instantly. The review doesn't. The skill that matters in 2026 is not prompting an AI to write code. Almost anyone can do that now. The skill is building the system that reviews, tests, and gates what the AI produces at a pace that keeps up with how fast it writes.
The part both sides miss
The vibe coders aren't wrong. AI can genuinely build software. I have 16 products in my portfolio and I've never written a line of code by hand. The technology works. The capability is real. If you dismiss AI-assisted development entirely, you're going to watch other people build things ten times faster than you and wonder what happened. The critics aren't wrong either. Unreviewed AI code fails. It leaks keys. It produces white screens. It writes functions that compile clean and do absolutely nothing useful. If you ship everything a model generates without verification, you're going to break things in ways that are embarrassing at best and dangerous at worst.
Both sides are arguing about the tool. Neither side is talking about the system the tool operates within. A circular saw is dangerous without a workshop. Nobody argues about whether circular saws work. They argue about whether you know how to use one safely. AI code generation is the same question wearing different clothes. The tool is neutral. The system around it determines whether what you build stands up or falls over. I vibe coded for two months. What I learned is that the code was never the hard part. Building the machine that makes the code trustworthy - the testing, the gates, the routing, the human verification, the scoring - that's the actual work. And it's the work that almost nobody is doing.
Get the system right. The code takes care of itself.