Microsoft introduces Windows AI Foundry

Microsoft is taking a clearer position in the AI space with something it’s calling the Windows AI Foundry. It’s not just a rebrand, it’s a real upgrade to how AI gets developed and deployed on Windows machines. If you build software, drive operations, or manage infrastructure, this matters.

The Windows AI Foundry builds on what used to be the Windows Copilot Runtime. Now, it consolidates all core Windows-based AI tooling into one working system. This new environment supports on-device inferencing using CPU, GPU, and NPU, neural processing units. It’s not theoretical. The infrastructure is tuned for Microsoft’s Phi Silica models and other compact models that don’t rely on the cloud. That’s big. It’s about giving software the power to carry out advanced tasks directly on the device, including things like computer vision and speech processing, without latency, without privacy issues, and without cloud costs.

And yes, this is part of a broader strategic alignment with Azure’s AI Foundry approach. We’re seeing Microsoft scale its thinking, bringing the same unified model development system you get in the cloud right down to the devices people already use. Think of it as local-first AI. The resources underneath it are stronger, cheaper, and already shipping in hardware. The new Surface line, equipped with Qualcomm NPU chips, is priced competitively, starting at $799.

As Microsoft Partner Director of Product, Windows Platform and AI Runtime, Jatinder Mann put it, “We’re bringing the full power of AI cloud to client.” That’s a critical shift, for developers and for any business that’s serious about integrating intelligent applications at scale.

Bottom line: If you’re not exploring how this can cut latency, cost, and reliance on external clouds, you’re leaving efficiency on the table. AI isn’t just running in the cloud anymore, it’s running where your users are.

Foundry local enables AI model deployment and hardware-tailored execution

Let’s simplify this. Foundry Local is the tool that handles model management and deployment directly on your Windows PC. It reads your system’s hardware profile, picks the best version of an AI model, downloads it, and starts running it. No complex setup. No manual tuning needed.

This creates a real operational advantage. Foundry Local ensures an AI app takes full advantage of whatever compute capabilities are available, be it a CPU, GPU, or NPU. It also exposes a simple REST interface, which means developers can hook into it with the same syntax used for cloud-based models. Same calls, same behavior, better performance. You’re not building two separate systems for cloud and on-device inference, it’s unified. That’s a serious reduction in engineering cost.

The practical testing confirms the flexibility. On one side, you’ve got an x64 PC running an Nvidia GPU. On the other, an Arm Windows device with a Qualcomm NPU. Drop the same Phi 4 Mini model into both with Foundry Local, it runs just fine. The system knows which runtime to use. On the Arm setup, that model even used the NPU directly, confirmed via Task Manager.

This is how full-stack AI should behave. Decisions like these matter at scale. If your business has thousands of devices with widely different hardware, this kind of adaptive inferencing drastically reduces overhead, both in time and compute.

More importantly, it shifts control back to the business. When you don’t need to call an external API just to generate intelligent outputs, you regain ownership over the experience. That’s cost, control, and speed, delivered locally. It goes beyond AI strategy, it reshapes how enterprise software is built.

Windows ML serves as the foundational inference engine

Windows ML is the backbone supporting AI workloads across the Windows AI Foundry. It handles inference, the part where models execute and deliver real-world output. What matters is that this engine is optimized across a broad range of hardware, from basic CPUs to advanced GPUs and NPUs. It removes any ambiguity around model deployment, code written using Windows AI tools runs reliably across devices and configurations, with no need for hardware-specific rewrites.

This level of abstraction means enterprises can deploy AI features across their entire Windows device fleet without worrying about inconsistency or performance mismatches. Developers pass models through Windows ML, and the platform automatically chooses the best available runtime to execute those models. It supports the ONNX format, a widely adopted standard in the AI world, so teams aren’t locked into a closed ecosystem. That alone makes long-term integration and scaling much simpler.

The integration between Windows ML and Foundry Local enables continuous updates, device-level optimization, and centralized model management. The system can automatically check for the best model variant and even fetch or update it based on your device’s hardware in real time.

For executives overseeing IT strategies, this isn’t just a technical win, it’s operational stability. It assures compatibility as hardware generations shift and opens up AI capabilities across both high-end development machines and cost-effective business PCs. You’re enabling a future-proof pipeline while maintaining full compatibility with your existing toolchain.

This foundation also supports easier compliance and testing. A common runtime environment across devices means faster QA cycles and less risk when rolling out AI updates. It’s a better way to scale reliably while keeping engineering and support workload minimal.

Microsoft integrates the Model Context Protocol (MCP)

Microsoft is embedding the Model Context Protocol (MCP) directly into Windows to allow AI agents to interact with local applications in a clean, standard way. This isn’t another abstract API layer. MCP enables specific app features to be surfaced and used by AI agents without needing custom logic for every integration.

This is significant if you’re building AI-driven workflows or agent-based systems internally. With MCP, applications can act as data sources or task executors for AI apps. They expose only the features you want enabled and do so securely, using a local MCP server model tied to a permissions-aware registry. These MC Protocol endpoints replace fragmented integration efforts with a single, discoverable, and controlled interface that AI agents can call locally.

Kevin Scott, Microsoft’s Chief Technology Officer, referred to MCP as “the HTTP of a distributed AI platform.” That level of standardization means applications written to support MCP are far more interoperable, not just across different tools, but across systems and AI agents as well.

Windows CEO Satya Nadella doubled down on this direction during his Build keynote, focusing on what he calls “the agentic web.” That’s not just marketing. It’s Microsoft laying the foundation for how modern AI services and user-level applications will connect, natively, securely, and in standard formats that enterprise environments can adopt quickly.

Divya Venkataramu, Director of Product Marketing for Windows Developer, emphasized that MCP offers “a standardized framework for agents to interact with Windows-native apps via their MCP servers.” That statement makes it clear, Microsoft is designing these systems for production deployment, not for experimentation.

Enterprises looking to simplify their AI integrations while scaling intelligently should look at this seriously. MCP is shaping up to be the new common language for agent-driven workflows, one that doesn’t require building and maintaining heavy middleware or dealing with unpredictable behavior. It’s structured. It’s secure. And it’s already being built into the Windows platform now.

App actions empower developers to integrate specific functions into AI workflows

Microsoft is simplifying the way applications can interact with AI agents through App Actions. These are structured, declarative endpoints within your software that can be exposed to agents using JSON-based definitions. Unlike traditional APIs, App Actions are part of a framework designed for integration into intelligent workflows. You define what actions your software can perform and how, the OS then makes them accessible to the AI systems that need them.

For development teams, this reduces the amount of boilerplate needed to make features compatible with intelligent agent systems. Each action is wrapped with semantic metadata and controlled access. This allows AI agents to discover and utilize an action while keeping the execution environment secure. Actions are classified through defined entities, which tell the agent what types of input and output to expect. You get runtime control as well, the ability to toggle the availability of actions dynamically, based on context or user permissions.

Microsoft is providing the App Actions Playground for testing and iteration, and early support is integrated into the preview release of the Windows SDK. That brings development in line with enterprise standards, allowing for proactive testing and user-controlled interaction.

For enterprise stakeholders, App Actions are an important part of enabling AI without losing control. They allow precise delegation to AI agents without the risk of overexposure to sensitive application operations. These actions, since they’re structured and transparent, become testable, auditable points of integration between business apps and AI systems.

This framework also opens opportunities for analytics and behavioral optimization. Since each call to an App Action is trackable, teams can monitor where and how AI interfaces are triggering business logic. That visibility becomes highly useful across product, compliance, and security functions.

Visual tools and developer resources improve AI model customization and performance

Development teams no longer need to rely on out-of-the-box AI models or spend heavily on cloud tuning cycles. Microsoft is giving developers local options to customize models with tools like the AI Toolkit inside Visual Studio Code. These resources are designed to help developers fine-tune built-in models, specifically Phi Silica, using efficient methods like LoRA (low-rank adaptation), which modifies a base model with domain-specific data.

What makes this significant is the shift from generic model outputs to focused, enterprise-aligned responses. With LoRA adapters, you’re refining the model for your specific business or product context. This can eliminate unnecessary guesswork during model execution and lowers the chance of hallucinations, those fabricated or irrelevant completions that undermine trust in AI systems.

Microsoft has also made it easier to evaluate these adapters quickly through its AI Dev Gallery, an app available from the Windows Store. Developers can pull in working adapters, test them against data, and push updates to inference-ready models, all on the same device. No cloud queues, API throttling, or extra fees.

For business leaders, this capability delivers flexibility and control. You’re no longer waiting on centralized teams to deploy custom models. Smaller, distributed teams, on product, ops, or CX, can deploy local models with high confidence in output reliability. You see the impact on velocity, data alignment, and cost immediately.

And since model tuning happens within trusted dev environments like Visual Studio Code, there’s no break in workflows or need for external platforms. This lowers barriers to adoption across internal teams while aligning with existing compliance and deployment pipelines. For modern product and engineering orgs, that’s a real efficiency driver.

Microsoft is future-proofing the Windows AI ecosystem

Microsoft is removing traditional hardware constraints from AI development on Windows. Windows AI Foundry is designed to operate across multiple chip architectures, x64, Arm, GPU, and NPU, without forcing developers to rewrite or recompile for different devices. This hardware-agnostic strategy allows a single application to scale across everything from entry-level laptops to AI-accelerated edge devices.

Windows ML and Foundry Local work together to abstract hardware runtime differences. Whether your endpoint is running a CPU-only configuration or a dedicated NPU, the platform selects the proper execution layer, downloads the corresponding ONNX runtime, and optimizes it on the fly. For developers, this ensures consistency and performance tuning without custom engineering for every different chip or device class.

There’s also meaningful cost alignment. Devices like the new Surface line feature Qualcomm’s NPU-based chipsets and start at $799. That price point brings advanced AI capabilities into a much broader set of devices, making local inferencing a mainstream capability almost overnight. At the same time, Copilot+ PC hardware is required to support 40 trillion operations per second (TOPS) in AI accelerators, making sure the platform is still positioned for high-performance applications when needed.

This strategic architecture matters to C-suite decision-makers because it stabilizes the investment curve in AI infrastructure. You’re deploying applications once, not recreating them for various configurations or future releases. You get a development stack that evolves with your hardware roadmap, freeing your teams to focus on product iteration, not compatibility maintenance.

And with Microsoft’s optimization of its own model catalog for this environment, businesses don’t need to start from zero. Whether you’re using Microsoft’s Phi models or sourcing from open libraries like Hugging Face or Ollama, you’re operating with tuned baselines that reduce onboarding friction and improve time to value.

Microsoft’s Windows AI strategy reinforces its legacy

Microsoft is returning to its foundational strength, being the software platform businesses can build upon. The introduction of Windows AI Foundry extends the operating system beyond traditional functionality. It now serves as a full-stack environment for AI application development, model deployment, system-level integration, and secure workflow orchestration.

What makes this unique is Microsoft’s scope. They’re not building features in isolation. Foundry Local, Windows ML, App Actions, and the Model Context Protocol are designed to work together. They provide a unified AI experience on the same desktop operating system used by most modern enterprises. That makes Windows not just compatible with AI, but purpose-built for it.

Microsoft’s inclusion of open-source models, local tuning tools, REST-compatible APIs, and secured app-facing architectures shows clear strategic intent. They’re creating a cohesive, controlled AI operating model where innovation happens locally, with performance, privacy, and cost efficiency prioritized.

For the enterprise, this reduces reliance on fragmented ecosystems. You’re not scaling with a patchwork of tools. You’re building on a platform that integrates security, performance, and workflow control while remaining open to external models and standards. It’s visibly aligned with how modern software is built, deployed, and governed.

Investing in the Windows AI platform ensures readiness for the next cycle of intelligent application demands. It gives teams the tools to ship AI products faster while retaining control over data, experience, and compliance. And for organizations making long-term bets on AI, across apps, services, and infrastructure, having that consistency at the platform layer matters.

The bottom line

Microsoft isn’t experimenting with AI on Windows, it’s laying down infrastructure. The shift from cloud-only models to local deployment isn’t just technical progress. It’s a signal that AI is becoming a core expectation at the OS level, not a value-add. With Windows AI Foundry, the company is providing a platform that’s fast, secure, cost-efficient, and enterprise-ready by default.

For business leaders, this means AI integration can happen without navigating a fragmented tool ecosystem. Whether you’re building custom workflows, improving internal tools, or rolling out intelligent features across a fleet of devices, this platform supports it. And it does so in a stable, hardware-agnostic, and forward-compatible way.

The advantage isn’t just technical, it’s operational. Faster deployment cycles, tighter control over data, controllable model behavior, and reduced reliance on the cloud for inference. These aren’t small wins. They reset expectations for scale and governance in AI-powered software.

The window of time to adopt infrastructure like this before your competitors do is narrow. The tools are here. They work. And they’re built into a platform your teams are already using.

Alexander Procter

June 20, 2025

13 Min