Your Azure AI Foundry’s Agent Army: Why It Wins

M365 Show with Mirko Peters - Microsoft 365 Digital Workplace Daily

0:00

-13:53

Your Azure AI Foundry’s Agent Army: Why It Wins

Mirko Peters - M365 Specialist

Oct 13, 2025

Transcript

Here’s the shocking part nobody tells you: when you deploy an AI in Azure Foundry, you’re not just spinning up one oversized model. You’re dropping it into a managed runtime where every relevant action—messages, tool calls, and run steps—gets logged and traced. You’ll see how Threads, Runs, and Run Steps form the paper trail that makes experiments auditable and enterprise-ready.

This flips AI from a loose cannon into a disciplined system you can govern. And once that structure is in place, the real question is—who’s leading this digital squad?

Meet the Squad Leader

When you set one up in Foundry, you’re not simply launching a chat window—you’re appointing a squad leader. This isn’t an intern tapping away at autocomplete. It’s a field captain built for missions, running on a clear design. And that design boils down to three core gears: the Model, the Instructions, and the Tools.

The Model is the brain. It handles reasoning and language—the part that can parse human words, plan steps, and draft responses. The Instructions are the mission orders. They keep the brain from drifting into free play by grounding it in the outcomes you actually need. And the Tools are the gear strapped across its chest: code execution, search connectors, reporting APIs, or any third‑party system you wire in. An Azure AI agent is explicitly built from this triad. Without it, you don’t get reproducibility or auditability. You just get text generation with no receipts.

Let’s translate that into a battlefield example. The Model is your captain’s combat training—it knows how to swing a sword or parse a sentence. The Instructions are the mission briefing. Protect the convoy. Pull data from a contract set. Report results back in a specific format. That keeps the captain aligned and predictable. Then the Tools add specialization. A grappling hook for scaling walls is like a code interpreter for running analytics. A secure radio is like a SharePoint or custom MCP connector feeding live data into the plan. When these three come together, the agent isn’t riffing—it’s executing a mission with logs and checkpoints.

Foundry makes this machinery practical. In most chat APIs, you only get the model and a prompt, and once it starts talking, there’s no formal sense of orders or tool orchestration. That’s like tossing your captain into the field without a plan or equipment. In contrast, the Foundry Agent Service guarantees that all three layers are present. Even better, you’re not welded to one brain. You can switch between models in the Foundry catalog—GPT‑4o for complex strategy, maybe a leaner model for lightweight tasks, or even bring in Mistral or DeepSeek. You pick what fits the mission. That flexibility is the difference between a one‑size‑fits‑all intern and a commander who can adapt.

Now, consider the stakes if those layers are missing. Outputs become inconsistent. One contract summary reads this way, the next subtly contradicts it. You lose traceability because no structured log captures how the answer came together. Debugging turns into guesswork since developers can’t retrace the chain of reasoning. In an enterprise, that isn’t a minor annoyance—it’s a real risk that blocks trust and adoption.

Foundry solves this in a straightforward way: guardrails are built into the agent. The Instructions act as a fixed rulebook that must be followed. The Toolset can be scoped tightly or expanded based on the use case. The Model can be swapped freely, but always within the structure that enforces accountability. Together, the triad delivers a disciplined squad leader—predictable outputs, visible steps, and the ability to extend responsibly with enterprise connectors and custom APIs.

This isn’t about pitching AI as magic conversation. It’s about showing that your organization gets a hardened officer who runs logs, follows orders, and carries the right gear. And like any good captain, it keeps a careful record of what happened on every mission—because when systems are audited, or a run misfires, you need the diary. In Foundry, that diary has a name. It’s called the Thread.

Threads: The Battlefront Log

Threads are where the mission log starts to take shape. In Azure Foundry, a Thread isn’t a casual chat window that evaporates when you close it—it’s a persistent conversation session. Every exchange between you and the agent gets stored here, whether it comes from you, the agent, or even another agent in a multi‑agent setup. This is the battlefront log, keeping a durable history of interactions that can be reviewed long after the chat is over.

The real strength is that Threads are not just static transcripts. They are structured containers that automatically handle truncation, keeping active context within the model’s limits while still preserving a complete audit trail. That means the agent continues to understand the conversation in progress, while enterprises maintain a permanent, reviewable record. Unlike most chat apps, nothing vanishes into thin air—you get continuity for the agent and governance for the business.

The entries in that log are built from Messages. A Message isn’t limited to plain text. It can carry an image, a spreadsheet file, or a block of generated code. Each one is timestamped and labeled with a role—either user or assistant—so when you inspect a Thread, you see not just what was said but also who said it, when it was said, and what content type was involved. Picture a compliance officer opening a record and seeing the exact text request submitted yesterday, the chart image the agent produced in response, and the time both events occurred. That’s more than memory—it’s a for‑real ledger.

To put this in gaming terms, a Thread is like the notebook in a Dungeons & Dragons campaign. The dungeon master writes down which towns you visited, which rolls succeeded, and what loot was taken. Without that log, players end up bickering over forgotten details. With it, arguments dissolve because the events are documented. Threads do the same for enterprise AI: they prevent disputes about what the agent actually did, because everything is captured in order.

Now, here’s why that record matters. For auditing and compliance, Threads are pure gold. Regulators—or internal audit teams—can open one and immediately view the full sequence: the user’s request, the agent’s response, which tools were invoked, and when it all happened. For developers, those same records function like debug mode. If an agent produced a wrong snippet of code, you can rewind the Thread to the point it was asked and see exactly how it arrived there. Both groups get visibility, and both avoid wasting time guessing.

Contrast this with systems that don’t persist conversations. Without Threads, you’re trying to track behavior with screenshots or hazy memory. That doesn’t stand up when compliance asks for evidence or when support needs to reproduce a bug. It’s like being told to replay a boss fight in a game only to realize you never saved. No record means no proof, and no trace means no fix. On a natural 1, you’re left reassuring stakeholders with nothing but verbal promises.

With Threads in Foundry, you escape that trap. Each conversation becomes structured evidence. If a workflow pulls legal language, the record will show the original request, the specific answer generated, and whether supporting tools were called. If multiple agents talk to each other to divide up tasks, their back‑and‑forth is logged, too. Enterprises can prove compliance, developers can pinpoint bugs, and managers can trust that what comes out of the system is accountable.

That’s the point where Threads transform chaotic chats into something production‑ready. Instead of ephemeral back‑and‑forth, they produce a stable history of missions and decisions—a foundation you can rely on. But remember, the log is still just the diary. The real action begins when the agent takes what’s written in the Thread and actually executes. That next stage is where missions stop being notes on paper and start being lived out in real time.

Runs and Run Steps: Rolling the Dice

Runs are where the mission finally kicks off. In Foundry terms, a Thread holds the backlog of conversation—the orders, the context, the scrawled maps. A Run is the trigger that activates the agent to take that context and actually execute on it. Threads remember. Runs act.

Think of a Run as the launch button. Your Thread may say, “analyze this CSV” or “draw a line graph,” but the Run is the moment the agent processes that request through its model, instructions, and tools. It can reach out for extra data, crunch numbers, or call the code interpreter to generate an artifact. In tabletop RPG terms, a Thread is your party planning moves around the table; the Run is the initiative roll that begins combat. Without it, nothing moves forward.

Here’s what Foundry makes explicit: Runs aren’t a black box. They are monitored, status‑tracked executions. You’ll typically see statuses like queued, in‑progress, requires‑action, completed, or failed. SDK samples often poll these states in a loop, the same way a game master checks turn order. This gives you visibility into not just what gets done, but when it’s happening.

But here’s the bigger worry—how do you know what *actually happened* inside that execution? Maybe the answer looks fine, but without detail you can’t tell if the agent hit an external API, wrote code, or just improvised text. That opacity is dangerous in enterprise settings. It’s the equivalent of walking into a chess match, seeing a board mid‑game, and being told “trust us, the right moves were made.” You can’t replay it. You don’t know if the play was legal.

Run Steps are what remove that guesswork. Every Run is recorded step by step: which model outputs were generated, which tools were invoked, which calculations were run, and which messages were produced. It’s chess notation for AI. Pawn to E4, knight to F6—except here it’s Fetch file at 10:02, execute code block at 10:03, return graph artifact at 10:04. Each action is written down in order so you can replay it later.

That structure is a huge relief for developers. Without Run Steps, you’re staring at a final answer with no idea how it came to life. Was the search query wrong? Did a math error slip in? You’re left guessing. With Run Steps, you can scroll through the timeline, identify the exact misfire, and patch it. Debugging stops being guesswork and becomes forensics. It’s the difference between a foggy boss fight where you can’t tell who attacked, and a combat log that shows every sword swing and spell cast.

Compliance teams get their win too. When auditors ask, “How was this summary generated?” you don’t need to describe model “reasoning” in abstract terms. You have receipts: the tool call, the interpreter step, the assembled answer, all timestamped. That transforms explanations into governance. You show evidence instead of spinning stories. Enterprises love this because it shifts risk into accountability—proof instead of promises.

And for day‑to‑day operations, Run Steps create reproducible patterns you can rely on. If a workflow needs to be re‑run, you can follow the same sequence. If a result is challenged, you can replay it. On a natural 20, Runs with full Run Steps give you auditable, replayable evidence of how outputs were built. On a natural 1 in other systems, all you’d get is a wandering output with no trail.

That’s why this piece of the agent lifecycle matters. You’ve got the diary in Threads, the activation in Runs, and the move‑by‑move log in Run Steps. Together, they turn improvisational AI into an accountable teammate whose actions you can trace, test, and defend.

Of course, knowing that all this detail exists is only part of the puzzle. You’ll need a way to interact with it—something structured enough to launch agents, trigger Runs, and read back Run Steps without drowning in raw API calls. And the surprising part? You don’t need a gleaming command deck to do it. The console most folks use is something familiar, sturdy, and a bit less glamorous: .NET.

Arsenal and Alliances

Arsenal and alliances are what turn an agent from a chatterbox into a worker. In Azure Foundry, that arsenal comes in the form of tools—practical extensions that let the agent move from words to actions. Instead of just describing how to check a document or run a calculation, the agent can actually do it and return the output. That distinction is what makes the platform valuable in a real enterprise rather than just impressive in a demo.

Foundry gives you three clear categories of capability. First are the built‑in tools. These include the Code Interpreter, Bing search, SharePoint and Microsoft Fabric connectors, and Azure AI Search. With these in play, the agent can analyze data, pull files, search enterprise content, and even spin up charts or reports without custom glue. Each one expands the scope from chatty responses to tangible work products you can inspect and reuse.

Second, you’re not locked into only what Microsoft ships. Foundry lets you register custom tools through OpenAPI specs or the Model Context Protocol (MCP). MCP in particular matters because it’s treated like “USB‑C for AI tools.” Instead of hand‑writing wrappers every time, you connect agents to remote MCP servers, and the system automatically handles tool discovery, versioning, and invocation. That lightens integration overhead in a big way, especially when your environment has dozens of systems to wire together.

Third, every tool call in Foundry is observable. Calls are logged at step level with identity, inputs, outputs, and timestamps. That means the ops view of these agents isn’t trust‑me magic. It’s a ledger. You can watch exactly what was invoked, confirm that it followed proper permissions, and keep permanent records for compliance.

To anchor these categories, picture the quickstart demo. You upload a CSV file as a message to a thread. The agent calls the Code Interpreter tool inside its run steps, processes the file, and generates a chart. That artifact comes back as an attached image, visible directly in the thread log. You didn’t write a parser or visualization yourself. You gave an instruction. The agent selected the tool, executed, and returned a file you could drop into a report. That’s not theory—it’s documented behavior in the SDK samples.

Where this really opens up is enterprise integration. Tools can link into Logic Apps, which means the agent can access over 1,400 existing SaaS and on‑premises connectors. Rather than re‑coding adapters for CRM, ERP, or ITSM platforms, you configure against connectors the business already runs. It’s a scale play: one agent can securely operate across a broad landscape without you writing brittle API bridges.

The question everyone asks next is security. Foundry addresses it at the core. When tools connect into enterprise systems like SharePoint or Fabric, they do so using on‑behalf‑of authentication mediated by Microsoft Entra. If you weren’t cleared to read a file yesterday, the agent won’t magically bypass that wall today. Identity and permissions remain intact, which is the only way compliance teams give their blessing. Every call is also traceable—who invoked it, what was sent, what was received—so teams always hold the receipts.

On a natural 20, this entire toolkit makes your agent squad more than a novelty. You get units that can query actual data, process it, and act within guardrails already familiar to IT. On a natural 1, without these options, you’re left with a chat session that talks a big game but never does the work.

That’s why the arsenal matters as much as the squad leader. The model and instructions may anchor how the agent thinks, but without tools connected responsibly and logged reliably, it remains a half‑measure. With Foundry’s built‑in set, MCP and OpenAPI extensibility, and enterprise‑grade security, you have the makings of a disciplined force, not a collection of guesswork prompts.

And this brings us to the bigger picture. The agents in Foundry aren’t built to be random sidekicks or toys. They’re designed as governed operators, with structure, with loads accounted for, and with gear that makes them useful in production.

Conclusion

So here’s where the campaign wraps. Conclusion time: Foundry isn’t just throwing prompts at a model, it’s giving you a repeatable way to build agents with a brain, a rulebook, and a toolkit—and every action they take gets logged in Threads, Runs, and Run Steps. For dev leads and compliance folks alike, the headline is simple: reproducible, auditable execution with SDKs like .NET giving you full lifecycle control and visibility.

Your next step? Spin up a test project, create a basic agent, run a Thread, and then inspect the Run Steps either in the SDK or the portal. That’s how you confirm the logs match the story.

If this helped you roll a natural 20 on deploys, subscribe and toggle alerts.

Or: We tamed another cluster—subscribe to keep your fleet in formation.