AI tools may create a false perception of improved productivity

AI tools for coding are fast. That’s not in question. They give you code instantly. It looks solid. It compiles. Everyone feels like they’re moving quickly. But feelings and facts are two very different things.

Actual controlled testing tells a different story. Earlier this year, METR, Model Evaluation and Threat Research, ran a randomized study with experienced open-source developers. They split the group. Half had access to AI tools; the other half didn’t. The developers using AI thought they were coding 20% faster. The truth? They were actually 19% slower.

That’s a 39-point swing between perception and reality. And it’s happening not because these developers lack experience, but because AI-generated code looks right. Until it’s not. It might be using outdated libraries, or calling parameters that don’t exist. Worse, it might introduce race conditions or silent security flaws that don’t get caught until later, when damage is already done.

This is where the cost enters. AI gives you code at near-zero marginal cost. But to rely on that code, you still need humans to analyze, verify, and secure it. That’s not a “nice to have”; it’s the critical path. Veracode’s 2024 GenAI Code Security Report makes this obvious: 45% of AI-generated samples had security flaws ranked on OWASP’s top 10 list. These aren’t just typos. We’re talking about SQL injection vulnerabilities, broken authorization controls, things that can take entire systems down.

So, while the feeling of speed is attractive, it becomes a problem when decision-makers confuse it for actual productivity. This isn’t about being slow. It’s about not being reckless. The problem most teams have isn’t lack of access to AI. It’s thinking they can skip the verification step and still “go fast.” They can’t.

If you want speed, you still need control. You don’t scale a development process by blindly accepting AI output. You scale by hiring humans who can audit its suggestions and enforce quality standards as fast as the code is delivered. Anything less is just playing with risk you can’t afford.

The core competence in the AI era is verification engineering

Using AI to write code isn’t the real challenge. Anyone can ask a chatbot to generate a function. The real value is knowing whether that function is correct, secure, and maintainable. That’s verification engineering. And in a world where AI is everywhere, this becomes the central skill that defines engineering teams.

Most developers still think the measure of their work is how much code they produce. That made sense before generative AI. Now, volume is irrelevant. AI can generate more lines of code in five minutes than a human does in five days. What matters is whether that code does what it’s supposed to do, and whether it does it safely. If it fails silently, generates vulnerabilities, or creates long-term complexity, it’s a liability, not an asset.

This shift is already visible among high-performing developers. The best ones don’t just prompt the AI. They build environments where output is tested, reviewed, and audited by default. They use “golden paths”, secure, pre-reviewed templates that clarify what AI tools are allowed to generate. They wrap AI in continuous testing pipelines, threat modeling, static analysis, and runtime checks to make sure nothing dangerous slips through. That’s not overhead. It’s accountability.

Andrej Karpathy, one of the founders of OpenAI and former head of AI at Tesla, recently said he could be “10X more powerful” using today’s tools, provided he strings them together “properly.” The key word is “properly.” That’s doing the work most people skip. He can do it because he understands deeply when AI is likely to be wrong. He has pattern recognition built from years of writing, and fixing, code. Most people don’t. Without proper validation, AI tools flood systems with inaccurate assumptions that break things later at scale.

Marlene Mhangami, Developer Advocate at Microsoft, summed it up well: “The bottleneck is still shipping code that you can maintain and feel confident about.” That’s where the actual productivity curve bends upward. Not when the coding gets faster, but when confidence in the code’s long-term stability goes up with it.

C-suite leaders need to understand that scaling development today doesn’t mean hiring more developers or adding more AI tools. It means hiring engineers who can verify output at the same speed it’s generated. Verification is the control layer we need to trust AI. Without it, scale becomes chaos. With it, scale becomes sustainable.

Hasty AI integration without rigorous verification leads to technical debt

There’s no upside in moving fast if the code you’re deploying requires significant rework down the line. That’s what’s happening right now with many development teams adopting AI tools too quickly, assuming these systems are reliable enough to be trusted without question. They’re not. Without strict review and validation processes in place, AI-generated code adds long-term complexity that becomes expensive to fix.

The term for this is technical debt, and AI accelerates its accumulation. It does that by pushing developers into a faster loop of output where each release introduces new unknowns. These aren’t just bugs, they’re design flaws, security gaps, and poorly understood integrations that pile up and make future development harder and slower, not faster. Teams running without tight controls may hit early velocity metrics, but they’ll eventually stall under the weight of maintenance they didn’t plan for.

This isn’t speculation. Veracode’s 2024 GenAI Code Security Report found that nearly half, 45%, of AI-generated code samples contained serious vulnerabilities, including those listed on the OWASP Top 10 like SQL injection and broken access control. These are not minor errors. They’re exploitable faults that can give attackers access to sensitive systems or customer data.

SonarSource has warned about this trend. They describe it as the emergence of “write-only” codebases, produced so quickly and with such inconsistency that human teams can barely audit, understand, or extend them. These systems become resistant to change, brittle during updates, and risky to operate. The speed of generation outpaces the bandwidth of teams to clean up, which means product integrity declines even as output increases.

For executive teams, the signal is clear. AI integration must be implemented with a forward-looking governance model. Teams need quality gates that are automated, scalable, and enforced. If you’re saving time now but compromising system reliability, you’re not gaining, you’re simply deferring risk. Real productivity comes when fast development is paired with a structure that keeps code quality high every step of the way.

Effective AI adoption requires a systems-thinking approach

Deploying AI tools in software development isn’t just about connecting an IDE to a chatbot. That’s too shallow of an implementation. If your goal is actual performance at scale, the approach has to be systematic. This means thinking beyond the tool itself and building the right structure around it, security, testing, auditability, and control mechanisms that ensure output can be trusted.

The companies seeing meaningful results from AI-assisted development are doing more than just prompting. They’re designing environments with hard expectations. Standardized inputs. Predefined architectures. Automatic verification. Golden paths, rigid templates that dictate exactly how AI should deliver code within secure boundaries. These guardrails limit randomness and enforce quality, making every AI contribution predictable and safe.

The rest is infrastructure. Linting, SAST, DAST, regression testing, all bunched into pipelines that continuously audit the work AI produces. It’s an end-to-end approach that narrows failure windows and catches instability before it enters production. These systems don’t just operate faster, they operate with clarity. Every change made by the AI is either validated or rejected against defined rules. No ambiguity. No blind trust.

Simon Willison, a respected open-source developer, coined the term “vibes-based evaluation” to describe people accepting code mainly because it “looks right.” That’s exactly the risk when teams don’t build automated verification into their workflows. It creates a sense of confidence not backed by substance. That approach doesn’t scale. It creates fragility.

If you’re making leadership decisions around AI adoption, this is what matters: system-level thinking. Don’t chase short-term output. Build infrastructure that channels AI through quality safeguards. Create secure coding policies. Invest in automated testing gates. Train teams to treat AI not as an authority, but as a fast collaborator whose work always needs to be reviewed.

The developers who lead in this space won’t be the ones asking AI to do more. They’ll be the ones setting conditions so that what AI produces aligns with engineering standards. That’s where the leverage comes from, structured control at the system level.

Key takeaways for leaders

  • AI productivity illusion: AI-generated code feels faster but often adds friction. Leaders should measure engineering impact by post-verification output, not perceived speed.
  • Verification over volume: The critical skill in AI-era development is validation, not code generation. Prioritize hiring and training engineers who can audit and control AI output at scale.
  • Technical debt acceleration: Rapid AI adoption without oversight compounds long-term risk. Leaders must invest early in automated quality gates and secure development frameworks to prevent mounting rework and exposure.
  • System-level implementation: Effective AI use depends on structured processes, not loose experimentation. Build enforcement into the stack, linting, testing, and secure templates, to harness AI safely and consistently.

Alexander Procter

February 11, 2026

8 Min