Foundry Hosted Agents are containerised applications that run inside Foundry Agent Service

Closing the Gap Between a Local Agent and a Production One

Microsoft's Foundry Hosted Agents aim to take the operational headache out of shipping Agent Framework agents — though it's still preview territory.

Article Image Alt

Building an agent that works on your laptop is the easy part. Getting it to production — with identity, scaling, session state, observability, and a sane versioning story — is where most projects stall. Microsoft's pitch with Foundry Hosted Agents is straightforward: take the code you already have running locally with the Microsoft Agent Framework (MAF), and give it a managed home that handles the boring-but-critical parts for you.

What Foundry Hosted Agents actually are

Foundry Hosted Agents are containerised applications that run inside Foundry Agent Service. The framing matters: this is your code, packaged as an image, deployed onto Foundry-managed infrastructure that's been tuned specifically for agent workloads rather than generic web apps.

The headline capabilities are what you'd expect from a serious agent runtime. Cold starts are described as predictable — Microsoft's own framing, not "instant." Compute scales to zero when idle, so you're not paying for an agent that no one is talking to. Each session gets its own VM-isolated sandbox with persistent filesystem state ($HOME and /files), which means an agent can resume a working directory exactly where it left off after an idle scale-down. Sessions persist for up to 30 days; idle compute is deprovisioned after 15 minutes and restored on the next request.

There's also bring-your-own VNet support for routing outbound traffic, isolation keys for namespacing end-user sessions, and built-in versioning with stable endpoints — useful when you need weighted rollouts or a quick rollback. Sandbox sizes range from 0.25 vCPU / 0.5 GiB up to 2 vCPU / 4 GiB, which is a useful detail when you're sizing workloads.

Two protocols, one decision

Hosted agents speak one or both of two protocols, and the choice shapes how you'll integrate.

The first is Responses — an OpenAI-compatible /responses endpoint where the platform handles conversation history, streaming events (via server-sent events), and background execution for you. Microsoft's recommendation is to start here if you're not sure. It also maps automatically to the Activity Protocol, which gives you a one-click publish path to Microsoft 365.

The second is Invocations, a more generic endpoint where you define the request and response schema yourself. It's the right choice when your workflow isn't conversational — anything that doesn't fit a chat-shaped contract. A single container can expose both protocols at once if you need it.

What the deploy actually does

If you've used azd before, the flow will feel familiar. Pointing it at an MAF agent will, optionally, create the necessary resources including a Foundry project and a deployed model. It then packages your code, builds an image, pushes it to Azure Container Registry, pulls it back down to provision compute, and assigns the agent its own Entra ID. You end up with a dedicated endpoint at something like https://{project_endpoint}/agents/{agent_name}, with scaling, session state, observability, and lifecycle management handled by the platform.

The Entra ID detail is worth dwelling on. Each agent gets its own identity — a service principal created at deploy time — which means it can call Foundry models, the Foundry Toolbox, and other Azure services without secrets baked into the image. That's a real improvement over the typical "stuff a key in an environment variable and hope for the best" pattern. One caveat from the Learn docs: the user creating the agent needs Azure AI Project Manager at project scope, because that role can assign Azure AI User to the platform-created identity.

Turning an agent into a host

The code change to make a local MAF agent hostable is small. The dev blog shows a minimal .NET form — drop these few lines into a standard ASP.NET Core app and you've got a hostable agent:

using Microsoft.Agents.AI.Foundry.Hosting;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddFoundryResponses(agent);

var app = builder.Build();
app.MapFoundryResponses();

app.Run();

The Microsoft Learn documentation shows a fuller, more canonical pattern that uses an AgentHost builder preconfigured for the Foundry hosting environment, plus an explicit protocol registration step:

using Azure.AI.AgentServer.Core;
using Azure.AI.Projects;
using Azure.Identity;
using Microsoft.Agents.AI;
using Microsoft.Agents.AI.Foundry.Hosting;

var projectEndpoint = new Uri(Environment.GetEnvironmentVariable("FOUNDRY_PROJECT_ENDPOINT")
    ?? throw new InvalidOperationException("FOUNDRY_PROJECT_ENDPOINT is not set."));
var deployment = Environment.GetEnvironmentVariable("AZURE_AI_MODEL_DEPLOYMENT_NAME") ?? "gpt-4o";

AIAgent agent = new AIProjectClient(projectEndpoint, new DefaultAzureCredential())
    .AsAIAgent(
        model: deployment,
        instructions: "You are a helpful AI assistant.",
        name: "my-agent");

var builder = AgentHost.CreateBuilder(args);
builder.Services.AddFoundryResponses(agent);
builder.RegisterProtocol("responses", endpoints => endpoints.MapFoundryResponses());

var app = builder.Build();
app.Run();

The two forms aren't really competing. The shorter one is the elevator-pitch version — useful for grasping how little ceremony there is. The longer one is what you'd actually write when you want explicit control over which protocols are exposed and how the host is configured. If you plan to surface both Responses and Invocations from the same container, the RegisterProtocol pattern is the way to do it.

In Python, it's even shorter:

server = ResponsesHostServer(agent)
server.run()

Either way, the point Microsoft is making here is that the same agents and workflows you run locally are the ones that run in the sandbox. No rewrite, no separate "production version" of your code drifting away from the prototype.

The production niceties you don't have to build

A few things in this integration quietly solve problems that teams normally end up rolling themselves.

Versioning treats every deployment as an immutable snapshot, so canary and blue/green rollouts work out of the box, and rollback is instant. Observability is wired up automatically — APPLICATIONINSIGHTS_CONNECTION_STRING is injected at runtime, which means MAF's OpenTelemetry traces flow into Application Insights without extra configuration. Stateful sessions persist files and state across idle scale-downs. And if you've provisioned a Foundry Toolbox in the same project, the hosted agent can reach into it for the existing tool catalogue via a standard MCP endpoint.

None of these are revolutionary on their own. The value is that they're all there together, and you don't have to assemble them.

Where this sits in the broader picture

The Agent Framework gives you a programming model — chat clients, tools, MCP integrations, context providers, middleware, multi-step workflows — that looks the same in .NET and Python. Foundry Hosted Agents give that model a managed runtime. Microsoft says the integration is heading toward general availability, with more to come.

For teams already prototyping with MAF, the practical question is whether the production benefits outweigh the cost of committing to Foundry as the runtime — and whether the preview status is acceptable for your use case. For many shops in the Microsoft ecosystem, the answer will be straightforward — you're already there. For others, the appeal will depend on how much custom infrastructure they've built around their existing agents.

Agent deployment is one of those topics that splits a room of European architects neatly down the middle — half want managed everything, half want VNet-bound control over every byte. It's exactly the kind of debate that fills the corridors at ECS, where the questions about where production agents actually belong tend to be louder than the answers.