Measuring generative AI’s performance is complex but increasingly important

Measuring generative AI performance isn’t obvious. When you install a new CPU, you can benchmark speed, thermal load, and efficiency right away. With AI, especially generative models like large language models (LLMs), it’s about how well they understand context, generate usable output, and solve business problems at scale. That makes things harder to measure, but not impossible. And for anyone leading a business, especially at the C-level, this is something you’re going to need to figure out fast.

Here’s the thing: AI is no longer theoretical. It’s operational. CFOs want ROI. CTOs want deployment numbers. CMOs want cost per action improvements. But AI return isn’t linear, and performance reporting still feels imprecise. A lot of vendors pitch “improvements” that boil down to gut feeling. That’s not good enough. If you’re running a company, you can’t depend on intuition and marketing slides. You need hard numbers, or at least repeatable measurements, to justify cost and guide strategy.

That’s why what the Vector Institute for Artificial Intelligence is doing matters. They’re working on independent benchmarking models to help companies evaluate generative AI based on real, consistent metrics. This is the kind of work that turns AI performance from a guessing game to a comparative framework. Eventually, we’ll have agreed standards that let you ask: What does success look like here? Did this model drive more revenue or reduce time-to-market by something measurable?

There’s no denying that generative AI will be core to every part of business in the next ten years. But executives will need to hold AI performance to the same standards expected of any other operational function. If you can’t measure it, you can’t manage it. And if you can’t manage it, it’ll either cost you without knowing, or worse, run you in the wrong direction.

Generative AI is fueling the demand for more developers

There’s a common misconception that generative AI will eliminate the need for developers. That’s just not happening. The opposite is true. AI tools aren’t replacing software engineers, they’re making them more efficient. That efficiency is pushing companies to take on more software initiatives. And with more projects, you need more people to deliver results.

What we’re seeing now is a productivity multiplier. Generative AI tools, like GitHub Copilot and other coding assistants, are speeding up routine programming tasks, helping developers focus more on solving bigger problems. That means engineering leaders are launching more products and features in less time. But scaling up output means scaling up teams. That’s where the demand shift is happening.

The data backs this. Early research shows that AI assistants reduce coding time and simplify problem-solving. Time-to-release is shrinking. Companies are no longer limited by developer bandwidth in the same way, so they raise their ambitions, larger backlogs, faster cycles, new platforms. And once the AI kicks productivity up, human capacity needs to follow.

For anyone leading tech or operations, this changes how team structure should evolve. Developers using AI aren’t automating themselves out of relevance. They’re becoming more valuable, especially those who understand how to orchestrate AI workflows and verify AI-generated code. The role is shifting from individual contributor to high-impact problem solver. That’s good news if you’re thinking about the next stage of growth.

If you’re in a hiring decision seat, keep this in mind: the companies winning with AI are investing in talent that can scale system output. In real terms, this means expanding your developer workforce, not shrinking it. Ignore the noise about job losses, it’s not happening in software engineering. Not now, and probably not later.

Strong data foundations are key for generative AI to perform reliably

If your data operations, governance systems, and security protocols aren’t solid, you’re not going to get reliable, trustworthy results from generative AI. Many leaders focus on the model layer and forget the upstream systems that directly impact its performance. That’s a mistake that costs time, money, and credibility.

For a generative AI system to produce accurate, bias-limited, or regulation-compliant outputs, your data pipeline needs to be clean, controlled, and secure. That means checking how your data flows through systems, who has access, and whether your inputs are actually useful for the questions you want your models to answer. Most companies still have fragmented data operations, manual processes, missing lineage, or weak access controls. That slows development and introduces risk.

Governance comes into play early. If you’re not managing your metadata, versioning, and access logs, you won’t be able to prove how your AI reached a specific outcome. That becomes a problem when regulators ask questions or when results cause negative outcomes in business processes. Uncontrolled data introduces randomness into the model output, which creates downstream consequences you can’t easily explain to stakeholders, users, or boards.

Security is another key layer. AI models that ingest sensitive, unprotected, or unclassified data inherit that risk. If your AI system pulls from unvetted or poorly secured sources, you can’t guarantee confidentiality, and worse, you expose the business to breach or compliance failures. Weak data security undercuts everything, strategy, trust, and scalability.

If you’re a decision-maker, avoid viewing AI deployment as a top-layer tool. The infrastructure underneath matters more than most leadership teams realize. Invest in robust dataops. Enforce data governance standards. Strengthen your security posture before you plug AI into critical systems.

No AI strategy will scale without disciplined data execution. You get out what you put in.

Legal and strategic clarity is required before deploying large language models (LLMs)

Before integrating large language models into your organization’s workflow, there are core legal and strategic questions you need to answer. These systems are learn from vast corpora of public and private data, and that creates exposure on multiple levels. If you’re moving forward without a clear risk assessment, you’re setting yourself up for complications you could otherwise avoid.

At the legal level, intellectual property is the first flag. If an LLM generates content that resembles third-party material, especially text, code, or media, your company could become liable if that content violates copyright laws. And it’s not always obvious where those boundaries are. Courts are still parsing what “original” means when AI is involved. Until that gets clearer, you need to be cautious.

Regulatory compliance is another layer you have to consider. For industries handling sensitive information, finance, healthcare, legal, you have to factor in how AI interacts with protected data. If a model processes or generates outputs based on customer records or internal data, you need measures in place to ensure compliance with frameworks like GDPR, HIPAA, or sector-specific standards.

On the strategic side, think about internal alignment. Are technical, legal, and executive teams working off the same playbook when it comes to deploying LLMs? If not, you risk creating inefficiencies or entering legal gray areas unintentionally. Before rollout, establish decision rights, usage policies, and risk boundaries. Your leadership team needs to be unified on what AI is allowed to do, where value is expected, and how exposure is being managed.

The smartest organizations are doing their AI homework up front. That means pulling in legal counsel early, building internal governance frameworks around AI use, and knowing where the limits are. Moving fast doesn’t mean skipping key questions. It means asking the right ones at the start, before deployment, before customer exposure, before risk crystallizes into liability.

Deploying AI at scale requires more than just engineering green lights. It demands clear operational and legal readiness. Get that wrong, and you won’t scale confidently. Get it right, and you lay the groundwork for sustained, defensible innovation.

Key highlights

  •  Measuring GenAI impact requires clear performance benchmarks: Traditional metrics don’t capture generative AI’s value. Leaders should support development of internal benchmarks or adopt emerging standards like those from the Vector Institute to measure ROI and justify investment.
  •  AI is increasing developer demand, not reducing it: Generative tools accelerate workflows, enabling more projects, not less. Executives should scale dev teams alongside AI deployment to fully capture productivity gains.
  •  Strong data infrastructure underpins reliable GenAI: Poor dataops, governance, or security will limit AI effectiveness and increase risk. Leaders must reinforce data systems to ensure scalable, trustworthy AI adoption.
  • Legal alignment is a prerequisite for GenAI deployment: LLMs raise complex IP, compliance, and liability issues. Companies should engage legal, compliance, and tech stakeholders early to define clear policies and avoid regulatory exposure.

Alexander Procter

April 30, 2025

7 Min