SecDevOps.comSecDevOps.com
Is Agentic Metadata the Next Infrastructure Layer?

Is Agentic Metadata the Next Infrastructure Layer?

The New Stack(yesterday)Updated yesterday

AI agent development is booming. Ninety percent of enterprises are actively adopting AI agents, according to Kong, and Gartner predicts that one-third of enterprise software applications will include...

AI agent development is booming. Ninety percent of enterprises are actively adopting AI agents, according to Kong, and Gartner predicts that one-third of enterprise software applications will include agentic AI by 2028. AI agents are autonomous assistants that can think, plan and execute actions. Although their behavior is novel, they resemble any production software application in one important way: They create a spectrum of metadata behind the scenes. “AI agents produce very rich metadata in each step they take while solving a task or interacting with a user,” Chris Glaze, principal research scientist at Snorkel AI, a company focused on data systems for agentic AI, told The New Stack. These steps, he added, provide a window into an agent’s reasoning process. Metadata such as user prompts, tool calls and decision confidence help paint a picture of an agent’s train of thought, making its actions more traceable. That information can inform retraining, compliance and cost optimization. It can also be used to improve end users’ trust in agentic systems. “Comprehensive agentic metadata is crucial for keeping AI systems grounded and delivering intended outcomes,” Ebrahim Alareqi, principal machine learning engineer at Incorta, a data and analytics platform provider, told The New Stack. Yet little has been said about the practice of collecting and storing metadata from agent interactions, let alone how teams can apply it in practice. “It’s a pretty fragmented landscape,” Greg Jennings, vice president of engineering for AI at Anaconda, a platform focused on building secure AI with open source, told The New Stack. “Most of this is still handled in a very ad hoc way.” Below, we’ll examine the kinds of data agentic systems are producing, highlight how teams are already putting it to work and explore emerging strategies for getting it right. The Types of Agentic Metadata With AI agents, there are two major types of data. One is the shared knowledge and business context designed for AI agents to function. “Think about it as metadata that goes into the AI,” Juan Sequeda, principal researcher at ServiceNow, told The New Stack. The other type is the data that agentic workflows produce themselves, which we’re calling agentic metadata. “AI itself has also generated a bunch of metadata that we want to be able to capture,” Sequeda added. Agentic metadata ranges from standard telemetry to richer signals that represent step-by-step reasoning processes. Specific types of agentic metadata include: Operational: IDs, timestamps, latency, memory use, token consumption. Reasoning: Steps in the thought process (often called reasoning traces or decision traces), confidence scores for each decision, error recovery paths. Interactions: Tool calls, resources used, data accessed, content versions, retrieval paths, order of operations, security policies applied, call frequency, repeated queries. Model: Models used, model versions, parameter counts, quantization levels. User: User prompts, session context, human corrections, user intent signals, memory reads and writes, final outcome or generated artifact. While assessing the final results of an agentic workflow is important, reasoning metrics matter most for pinpointing why decisions were made. “The most valuable elements are the provenance-rich execution path,” Neeraj Abhyankar, vice president of data and AI at R Systems, a digital product engineering consultancy, told The New Stack. This granular, step-by-step information, often referred to as traces, is typically stored as JSON objects for each step. It can reveal insights needed for observability, reproducibility, debugging and auditing, all of which can guide continuous improvement and help build trust, experts said. “This intermediate trace is the gold mine,” Edgar Kussberg, group product manager for AI code remediation at Sonar, told The New Stack. “Without capturing this reasoning layer, you are flying blind when errors occur.” Others echoed this notion. “Most valuable are decision traces and confidence scores, as they’re essential for compliance and model improvement,” Deepak Singh, chief innovation officer of Adeptia, a data automation company, told The New Stack. The hesitation points where agents fail and must retry are most helpful for revealing where agents struggle, he added. What You Can Use Agentic Metadata For Agentic metadata can improve agent systems in several ways, and understanding these use cases can help guide which data teams prioritize and log. Testing and Debugging Analyzing why failures occur is a big possible use case for agentic metadata. “The number one use case for agentic metadata is debugging observability and root-cause analysis,” said Alareqi. This data could expose an incorrect tool call or assumption. At Incorta, an internal SQL-generating agent uses metadata to learn more about its environment, produce more accurate SQL and inform debugging. “In practice, debugging is just opening the agent logs,” said Alareqi. “Every step of the session is there, and that trace is usually all we need to pinpoint and fix the issue quickly.” Such metadata can aid observability efforts to diagnose issues with agents. For example, in one of Snorkel AI’s studies, an agent failed to qualify an insurance applicant because it queried the wrong field in a database. “Once we identified that pattern in the trace and corrected it, the issue disappeared entirely,” said Glaze. With agentic metadata, you can also perform counterfactual testing, which tests how an agent performs under different contexts. “Traces can be fed into continuous evaluations and policy learning, using counterfactuals to refine prompts, tools and routing,” said Abhyankar. Continual Improvement Another use case is creating a continuous feedback loop for retraining. This can help AI agents avoid repeating the same mistakes or adapt to new user needs. “Track the metadata for an agent interaction alongside its outcome, good or bad, and you can modify flows, prompts or model parameters to improve future performance,” Chad Richts, director of product strategy at JupiterOne, creators of a cyber asset analysis platform, told The New Stack. That said, instead of necessitating large-scale retuning, agentic metadata can also guide smaller gradual improvements, according to Singh: “The killer application is continuous model improvement without full retraining.” By analyzing thousands of traces, you could identify trends and continuously inject targeted training data to optimize agent workflows. A pragmatic use case is eliminating unnecessary system calls. A specific example where agentic metadata proved useful at Adeptia was when agents showed low confidence scores and frequent retries while handling pharmaceutical data formats. This was easily solved by providing agents with additional training examples in that domain. “The metadata,” Singh said, “essentially taught us what our agent didn’t know it didn’t know.” Cost Optimization Perhaps the most impressive result is cost optimization. “How do you prove if an AI agent can deliver the same outcome at half the cost? By looking at the metadata,” said Alareqi. Optimization is important since opaque AI workflows can dramatically increase token usage, especially with reasoning-heavy models. Agent metadata can help pinpoint changes to remove redundancies like unnecessary API calls, find endless loops and identify repetitive tasks more suited for automation that’s not large language model (LLM)-based. All could streamline workflows and, in effect, reduce cost. One specific method is to compare reasoning paths across agents and models to find the most performant combination. “With detailed metadata on model calls and execution paths, teams can replay or simulate workloads against smaller or more efficient models,” said Jennings. Governance and Compliance Agentic metadata can also aid auditing and security goals, since you have a validated digital trail into individual steps and requests agents made, along with what data was accessed. “Agentic metadata becomes a continuous feedback loop that improves system reliability, compliance and operational efficiency across the organization,” Pratyush Mulukutla, co-founder and COO of DataBeat, an AdTech company under the MediaMint umbrella, told The New Stack. For him, agentic metadata helps in multiple areas, from detecting risk patterns to aiding postmortem analysis and regulatory alignment. MediaMint’s agentic platform, he said, has already been implementing metadata from agent workflows to enable compliant reporting for frameworks like GDPR. “Detailed metadata logs allowed teams to trace when an agent accessed personally identifiable information, why it accessed it and what rule set guided the action,” Mulukutla said. Search and Discovery There is also the possibility of using agent metadata for agent-to-agent discovery. As developers build more and more agents, ServiceNow’s Sequeda said, they’ll eventually want to know, “Which is the right agent I need for my task?” Agentic metadata could help supply that information, enabling developers, agents or users to find the right agent for the right task. Engineering Improvements Lastly, metadata from agents can guide software development efforts. This has to do with the architecture of agentic systems as well as unlocking efficiency improvements for software teams. For instance, Anaconda engineers track metadata produced by an internal agent that helps identify how to build packages fully end-to-end. They even deploy a separate agent to interpret these logs. “It has helped us surface gaps as we apply AI to those domains and help streamline access to information for our package-building team,” Jennings said. JupiterOne is exploring using metadata to restructure its agent architecture to avoid context overflow, goal drift and poor explainability. The idea is relatively simple: Instead of passing everything an agent does — like decisions,...

Source: This article was originally published on The New Stack

Read full article on source →

Related Articles