Google Cloud doubles down on storage built for AI

Google Cloud’s new storage offerings boost AI performance

Google is moving fast. Its latest additions to the Cloud platform, Rapid Storage and Managed Lustre, are designed for one thing: performance. And performance is non-negotiable when you’re training machine learning models or running real-time AI apps that rely on moving massive volumes of data quickly and accurately.

Rapid Storage targets submillisecond read/write operations by doing something deceptively simple, locating your storage right next to the GPUs and TPUs in the same data center zone. That’s how you eliminate latency, by removing the distance. It also integrates with Cloud Storage Fuse, an open source tool that lets object storage behave like a file system. That’s useful when your workflows depend on file-level access but need object storage scale.

Managed Lustre, still in preview, is the parallel file system Google built on DDN’s ExaScaler tech. It scales hard. Think high-performance computing and large-scale training runs that need to push data across nodes without slowing down. It connects to other Google Cloud services, which lets it slot right into your existing architecture.

The big deal here is that Google isn’t just catching up to AWS’s S3 Express One Zone, it’s beating it on core metrics. AWS offers 2-10 millisecond latency. Google’s claiming submillisecond. That’s fast. What makes this possible is how it’s built. Google dropped the REST API for gRPC, which is stateful. It keeps track of what’s happening across operations instead of treating each data call as a blind request. That means smarter performance, especially for AI, where latency directly hits training time and costs.

Ray Lucchesi, Founder and President of Silverton Consulting, made it clear: “I thought they were coming up to speed on S3 Express One Zone, [but] this is beyond that.” Translation, this isn’t Google trying to match AWS; it’s Google aiming to leapfrog.

If you care about AI speed, and you should, start watching where the hyperscalers are optimizing storage. Because it’s not just about processing power anymore. It’s about feeding that power fast enough to matter.

The drive for AI dominance risks vendor lock-in

Cloud vendors want to become your AI infrastructure. Not rent you hardware. Not offer tools. They want to own the whole stack, from data storage to model training to deployment. That’s what these new services are about, and that’s what makes this a high-stakes game.

Brent Ellis from Forrester made the point clearly: as storage becomes smarter, more automated, more integrated, more “data aware”, the upside is clear. You do more with less. Fewer moving pieces, better performance, lower overhead. But it gets harder to leave. Each new capability is convenient now, but it might cost you leverage later.

This is where smart leadership matters. If your long-term plan includes serious AI adoption, and it should, you have decisions to make today that will shape your flexibility tomorrow. A single-cloud strategy might look simpler on paper, but it locks you into that vendor’s roadmap, their pricing, their performance timelines.

Best case, it works for the next few years. Worst case, you’re trapped just when a better option becomes available, but the cost of moving is operationally and financially unjustifiable.

This isn’t fearmongering. It’s strategy. Even if your workloads run smoother today with Google’s AI stack or Amazon’s Bedrock, you need to architect your systems with optionality in mind. Make services portable. Keep your data formats open. Design for change, even if you never use it.

Enterprise IT has been through this before. VMware’s massive install base became a burden after Broadcom’s acquisition; many didn’t see the lock-in until it was too late. Roy Illsley, Analyst at Omdia, summed it up best: “The killer app for AI hasn’t been developed yet… you’re not locked in and can move between a few [vendors] with some effort.”

If you’re serious about agility, treat vendor lock-in like a technical debt. You might be fine now. But when the platform shifts, and it will, it can cost you. So invest in flexibility while you still have a choice.

Commercialization of Google’s proven colossus technology

Google doesn’t build from scratch unless it has to. When it does build something internally that works at the scale of billions of users, it eventually finds a way to offer that capability to enterprises. That’s exactly what’s happening with Rapid Storage, it’s a productized version of Colossus, the distributed file system that underpins nearly every core Google service.

Colossus has been tested under massive global demand. Now it’s commercialized, and enterprises get access to one of the most resilient, ultra-performant storage systems in existence. Rapid Storage delivers submillisecond performance by applying the same principles Google uses internally, co-location of compute and storage, zero-latency data pipelines, and engineering stacked for throughput.

What matters here is that Google’s not experimenting with Rapid Storage. They’re exposing proven infrastructure that has supported Search, Maps, YouTube, and more. This kind of maturity reduces implementation risk for customers. You’re not adopting version one of a product, you’re getting the hardened derivative of systems that have already supported production workloads at exabyte-scale.

Most storage solutions available today weren’t originally built with AI in mind. They were adapted. But Colossus, evolved as it is, has always had internal demands on speed, global availability, and consistent read/write performance. That naturally aligns it with the needs of modern AI systems. And now, it’s available through an external interface using gRPC, bypassing the more limited REST protocol, and fully integrated within Google Cloud services.

The technical advantage here is direct. Enterprises looking to build production AI systems are being handed access to the infrastructure that made Alphabet what it is. That holds practical value, scaling is easier, latency is no longer a bottleneck, and multi-petabyte datasets become manageable without external system dependencies.

Managed Lustre enhances HPC and AI training environments

Another piece of this is Google Cloud’s Managed Lustre file system. It’s not for everyone. It’s targeted, high-performance computing (HPC), AI training, simulation workloads, model tuning at scale. These are use cases that hit the limits of standard cloud file storage. That’s why Google partnered with DDN, one of the performance leaders in this vertical, to deliver something that’s purpose-built.

The tech foundation matters. DDN’s ExaScaler platform is fully integrated here. It’s been optimized for parallel file system performance and engineered specifically for high-throughput, low-latency file access. Google’s version goes a step further by handling the management layer, eliminating maintenance and orchestration overhead. You get raw performance, controlled through a simplified cloud-native interface.

This service also connects easily with Google’s broader AI solutions, including the training environments for Vertex AI or GPUs provisioned in Google Cloud. That flexibility makes it practical for enterprises scaling AI experimentation, while maintaining a unified storage pipeline. You don’t need to redesign your architecture to test large models or distribute training across clusters.

Compare this with what AWS offers, Amazon FSx for Lustre. It’s a solid product and clearly competitive. But Google is building toward deeper cohesion with its ecosystem. That may cut down integration complexity, especially for orgs already running AI workloads on Google’s platform or those deeply invested in their TPUs and large-scale data processing environments.

If you’re investing in AI that pushes performance thresholds, and your models rely on heavy iterative training, this isn’t an optional upgrade. It’s a baseline infrastructure decision. Waiting too long to modernize your storage architecture just increases risk. The faster your storage, the faster your training cycles. That directly translates to speed to market, system responsiveness, and monetization timelines.

Evolution of storage technology drives new enterprise AI architectures

Storage has shifted. What used to be a backend function is now a defining factor in AI performance. Object storage, long preferred for its scalability, has become a foundational layer for enterprises building AI architectures. Not because it’s new, but because it’s the only type of storage that operates efficiently at petabyte and exabyte scale without failure.

Google’s approach with Rapid Storage and its broader portfolio reflects this reality. As Brent Ellis from Forrester pointed out, storage primitives are evolving. They’re becoming intelligent and aware of the data they hold. That’s a critical specification when you’re working with AI systems that don’t just store data, they query it, manipulate it, learn from it, and train on it in real time.

Modern AI workloads demand characteristics traditional storage models weren’t built to support. Performance thresholds are increasing. Latency tolerance is dropping. And systems must handle both structured and unstructured data, often in the same workflow. This calls for storage that blurs the lines between file and object access, that offers the intelligence to optimize itself based on usage, metadata, and concurrency parameters.

Ray Lucchesi, President of Silverton Consulting, emphasized that object storage is already scaling to exabyte levels without issue. That kind of reliability, combined with new performance enhancements from tools like gRPC and native GPU co-location, redefines what storage is expected to do in a modern AI stack.

For enterprises designing infrastructure today, the key takeaway is this: storage is no longer passive. It’s active. It must integrate with compute, be aware of workload intent, and support elasticity and performance without trade-offs. This is not a feature upgrade, this is a fundamental shift in architecture. And if your tech roadmap includes AI at scale, this layer needs to be foundational, not optional.

Strategic flexibility is crucial for mitigating AI platform lock-in

The AI ecosystem is not settled. No single company owns it. While OpenAI may be ahead on models, no platform has full enterprise dominance, not AWS Bedrock, not Google’s Vertex AI, not Microsoft’s Azure OpenAI initiatives. That means when you build today, you still have room to control your architecture. But the window won’t stay open forever.

Roy Illsley from Omdia put it simply: “The killer app for AI hasn’t been developed yet… you’re not locked in and can move between a few [vendors] with some effort.” That’s a fact worth acting on.

Most enterprises won’t rely on foundation models exclusively. They’ll build or fine-tune smaller, domain-specific models. They’ll draw on open-source alternatives. And they’ll train locally, using private cloud, on-prem hardware, or public cloud when needed. That approach will only succeed if your tech stack allows for transition and adaptation without re-architecting every time requirements shift.

Brent Ellis from Forrester noted that hyperscalers assumed they’d control enterprise AI by default. That’s not happening. Enterprise teams are taking control, building their own models, adding flexibility to deployments, and requiring compatibility beyond a single provider’s ecosystem.

The risk here is subtle but real: when every useful tool in your AI pipeline depends on one vendor’s proprietary system, future transitions become expensive and disruptive. Even if you’re satisfied now, the moment their roadmap diverges from yours, or their pricing model shifts, you lose leverage.

For C-suite leaders, especially CTOs and CIOs, the takeaway is simple. Build with control in mind. Use services that offer performance and ease of use today, but keep your architecture open. Vendor citizenship should be beneficial, not mandatory. That’s how you stay agile as the next generation of AI capability comes online.

Key highlights

Google levels up AI infrastructure: Google Cloud’s new Rapid Storage and Managed Lustre offerings deliver faster AI performance, with submillisecond latency and HPC-optimized file systems, critical for organizations scaling AI workloads.
Beware of hyperscaler lock-in: As cloud providers embed more intelligent, proprietary tools, leaders must balance short-term performance wins with long-term flexibility risks by structuring cloud strategies to minimize vendor lock-in.
Proven infrastructure now available: Rapid Storage commercializes Google’s internal Colossus file system, reducing enterprise risk and offering hardened, high-scale infrastructure that’s already been stress-tested across global Google services.
Performance built for HPC and ML: Google Cloud Managed Lustre, built on DDN’s ExaScaler, provides a managed parallel file system for compute-heavy use cases; ideal for organizations running large-scale simulations or model training pipelines.
Smarter storage redefines architecture: AI demands more from storage, object systems must now support high throughput and workload awareness. Leaders should treat storage as a core design layer, not a passive backend service.
Design for platform mobility: With the AI landscape still in flux, decision-makers should prioritize cloud-agnostic architectures and prepare for cross-environment deployment to retain control and reduce future switching costs.