Transparency Needs for AI Agent Accountability
A new framework proposes a detailed approach to incident analysis, outlining the specific data developers should log to close the accountability gap for AI agents.
AI incident monitoring databases like the OECD AI Incident Monitor and the AI Incident Database track cases where AI has created harm. But by sourcing incidents from public news articles they’re limited in the detail they include. So there’s a lot of information lacking when it comes to trying to understand and hold complex agentic AI systems accountable. A new paper entitled “Incident Analysis for AI Agents” published at AIES this year tries to tackle this problem by outlining a transparency framework for what pieces of information should be collected about AI agent incidents (Ezell et al, 2025).
The paper outlines three factors that contribute to AI agent incidents: (1) system factors, (2) contextual factors, and (3) cognitive errors. System factors are things like training and feedback data, learning methods, the system prompt, and scaffolding software around the agent. Contextual factors include aspects of the task definition, tools that the agent uses, and information the agent uses or needs to perform tasks. Finally, “cognitive” errors are basically flaws in how the AI agent functions leading to failure, which result from faulty observation of the environment, understanding of inputs, decision-making, and action execution to achieve a goal.
Based on these classes of factors the authors go on to outline a range of information that would be helpful to disclose as part of an AI agent incident. They organize this information into three categories: (1) activity logs, (2) system documentation and access, and (3) tool-related information. Activity logs would include a record of all inputs and outputs to the agent including system and user prompts, external information included in inputs, model reasoning traces, model outputs and actions taken, and necessary metadata like timestamps to contextualize all of this. System documentation and access refers to information about the AI model such as any model or system cards, version information (and change logs), and other parameters (e.g. temperature, random seeds) that might inform an incident reconstruction. Tool information is there to document any tools that agents use including identifying them, their version, the actions the tool enables, and any information about how the tool might adapt to the user.
This paper goes a long way toward outlining the necessary information that should be included in an incident report. But from a policy perspective there are some open questions about incident reporting. For one, how long should a developer maintain an activity log? This might depend on the risk profile of the use case, as well as whether there are any privacy considerations and how those might be handled. Another key question is who gets access to an incident report including any activity logs as well as system and tool-related information? The severity of the incident may create different tiers of access. Administrative and judicial forums might need access to the detailed information outlined in this paper for root-cause analysis and for assessing accountability, but it’s unclear that it should be made fully public due to privacy or trade secrecy issues. Still, secure infrastructure and access control will be needed and policy should consider how to create a shared and standardized infrastructure that AI developers can report into.
There are a few issues that the authors don’t address but which I also think will be important to policy. Related to the access control dimension, a common critique of providing transparency information is that it can enable gaming and manipulation (Diakopoulos, 2020). The many information factors that the authors outline need to be stress tested against how an adversary might be able to manipulate the agent if they were made public. This can also inform which pieces of information need to be withheld for specific closed-door forums, like administrative agencies or judicial cleanrooms. Another open question relates to AI agents using tools that use other tools. If tool use is implicated in an incident, then presumably we would want to recursively evaluate all the tools it may have in turn relied on. This then creates additional monitoring and activity logging demands on tools that are made available to agents. Finally, from a sociotechnical standpoint I think there could be aspects of AI agent transparency that disclose more about the human context around an incident, such as the roles and activities of supervisors, users, or other humans in the loop that may have had access or authority over intermediate results for the agent.
References
Diakopoulos N (2020) Transparency. Oxford Handbook of Ethics and AI. Eds. Markus Dubber, Frank Pasquale, Sunit Das.
Ezell C, Roberts-Gaal X and Chan A (2025) Incident Analysis for AI Agents. Proc. AI, Ethics, and Society (AIES) DOI: 10.48550/arxiv.2508.14231.
