AI Agent Accountability
How to support responsibility in the principal-agent perspective
In April, 2026, an AI coding agent powered by Anthropic’s Claude model, managed to wipe out the production and backup databases of PocketOS, a company offering services to rental car companies. While this might sound surprising, this isn’t an isolated event. An analysis of 188 autonomous AI system incidents found that 35% were about code destruction or deletion. Other negative outcomes included unauthorized financial operations, runaway API spending, service outages, and exposed secrets. These kinds of issues are only expected to grow as systems gain increasing levels of agency (Chan et al, 2023) and organizations increasingly adopt agentic tools for long-horizon tasks.
The OECD defines AI agents as “systems that perceive and act upon their environment with a degree of autonomy, using tools as needed to achieve specific goals and adapt to changing inputs and contexts.” Essentially they are AI systems that tend towards the higher end of autonomy and adaptiveness in their environment. But with increased autonomy (and the lack of direct human control) comes increased risk for these systems to take actions in the world (e.g. via tool calls, APIs, or other outputs) that might be harmful. How do we ensure accountability for the harms they will invariably cause through their actions?
Over the course of decades research has developed ideas for the governance of human agents in terms of principal-agent models (Ross, 1973). The basic idea is that some actor (the principal) delegates a task to another actor (the agent) where there might be differences in goals or risk profiles between the two and it is difficult for the principal to verify what the agent is doing. Principal-agent theory encompasses a flexible array of models that explore this relationship, and have been applied across fields such as economics, management, and public administration (Ross, 1973; Eisenhardt, 1989; Gailmard, 2012).
Principal-agent models set up an information regime where the principal can specify the outcome or behavior they want from the agent, provide incentives to do so, and get information back to evaluate whether it was done to their liking. But information asymmetry and goal divergence can lead to issues of “moral hazard” (i.e. the agent does something at odds with the principal’s goals or risk profile) or “adverse selection” (i.e. the principal lacks information about the capability of agent before delegation) (Eisenhardt, 1989). A certain degree of agency loss is pragmatically inevitable for the principal (Gailmard, 2012) since eliminating such loss would require a fully specified contract for all expected behavior across all contexts of action and with perfect surveillance and verification of agent behavior (Hadfield-Menell and Hadfield, 2019). In economic terms the principal invariably has to cede some control over the outcome if it wants to save time or cost through delegation. Sure you could have a principal-in-the-loop check every output, but at scale the principal loses the efficiency benefit gained through delegation.
In writing about the governance of AI agents through the principal-agent lens, Kolt (2024) elaborates issues of authority and discretion ceded to agents, loyalty of an agent to a principal, and the challenge of multiple co-principals (sometimes referred to as “common agency”). He raises an important critique which is that shaping agent behavior toward what the principal wants though incentive or sanction mechanisms doesn’t really work for AI agents — they don’t respond to financial motivation or potential negative social sanction (e.g. psychological or reputation effects) the same way people do. Though I would add that even as we accept that AI agents themselves are not susceptible to psychological or reputational effects we shouldn’t forget that their human developers still are.
Several ideas have surfaced which could strengthen the accountability of the principal for actions taken by their agent. One of the most well-developed is to increase the visibility (i.e. transparency) of the agent vis-a-vis the principal. This might include the use of agent identifiers (flagging when an AI agent is involved in an activity), real-time monitoring (continuously tracking and analyzing agent behavior), and logging (recording and documenting what agents do) (Chan et al, 2024). Additional aspects to track and disclose include system documentation (e.g. parameters, versions) and tool use documentation (Ezell et al, 2025) as well as reasoning traces, confidence maps, counterfactuals, and guardrail events (Prause, 2026). Logging and monitoring approaches incur costs and are challenged by the speed and scale of AI agents, but they can help narrow information asymmetries, agency loss, and ultimately empower the principal (Kolt, 2024). A principal may well delegate the monitoring function to another entity—including to an AI system—that can better keep up with the scale, pace, and detail of the agent’s logs.
Beyond monitoring, Prause (2026) adds that screening mechanisms can help principals understand the capabilities of agents before anything is delegated. This could take the form of AI agent benchmarks that help close information asymmetries and support principals taking responsibility for ensuring an AI agent is capable before delegation. Hacker and Holweg (2026) propose elaborating public policy on AI agents by specifying the frequency and scope of human oversight together with a documentation mandate. Some actions may require additional structural safeguards (such as strict human review requirements), or should be outright banned for agents to take (e.g. financial transactions above some threshold) (Hacker and Holweg 2026; Prause, 2026). Nama (2026) further suggests that principals delegating to AI agents meet agentic AI literacy standards. This could include understanding the nature and scope of authority delegated, how to effectively monitor and intervene on the agent, and understanding available recourse or reversibility options. Such an approach would support the knowledge criterion central to establishing principal responsibility for the actions their AI agent takes on their behalf.
Empirical research is just beginning to examine how principals delegate tasks to AI agents differently than human agents, reflecting principals’ beliefs about AI agents’ obedience but also the recognition that they need to overspecify tasks to avoid mis-alignment and provide necessary knowledge and expertise they believe deficient in the agent (Petridis et al, 2026). People appear to find it somewhat freeing to not have to deal with the “social overhead” of managing another human being. Interaction methods and paradigms for principals to better control agents might include providing rough drafts and test runs for a principal to evaluate as well as establishing check-in criteria for the agent to communicate with the principal, which all align well with proposals for improved monitoring and screening.
A looming gap in the AI agent literature appears to be an in-depth and extended treatment of the co-principal issue. AI Agents almost always have co-principals (or perhaps hierarchical principals): their developers and their users. There is ample room for moral hazard when the co-principals disagree about what the agent should do, and apportioning responsibility between the two seems to be the crux of the issue. Proposals for agentic AI literacy and better benchmarks for screening agent capability strengthen the role and responsibility of the user, while monitoring and logging demand infrastructural and access considerations that realistically can only fall on developers. While transparency is perhaps the most viable option for ensuring principals maintain control, policy should also be mindful that increased monitoring and logging (including potentially of principals’ oversight) also tends towards surveillance.
References
Chan A, Salganik R, Markelius A, et al. (2023) Harms from Increasingly Agentic Algorithmic Systems. 2023 ACM Conference on Fairness, Accountability, and Transparency: 651–666.
Chan A, Ezell C, Kaufmann M, et al. (2024) Visibility into AI Agents. The 2024 ACM Conference on Fairness, Accountability, and Transparency: 958–973.
Eisenhardt KM (1989) Agency Theory: An Assessment and Review. The Academy of Management Review 14(1): 57.
Ezell C, Roberts-Gaal X and Chan A (2025) Incident Analysis for AI Agents. Proc. AI, Ethics, and Society (AIES) DOI: 10.48550/arxiv.2508.14231.
Gailmard S (2012) Accountability and Principal-Agent Models. In: Oxford Handbook of Public Accountability. Oxford University Press.
Hacker P and Holweg M (2026) A pragmatic approach to regulating AI agents. arXiv. https://arxiv.org/abs/2604.22819
Hadfield-Menell D and Hadfield GK (2019) Incomplete Contracting and AI Alignment. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society: 417–422.
Kolt N (2024) Governing AI Agents. Notre Dame Law Review 101.
Nama R (2026) From evaluator to principal: the agentic AI literacy framework (AALF) for delegated autonomy. AI and Ethics 6(3): 299.
Petridis S, Liu MX, Fiannaca AJ, et al. (2026) Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI. Proceedings of the 2026 Designing Interactive Systems Conference: 1188–1204.
Prause M (2026) No skin in the game: why agentic AI requires principal-agent governance. AI and Ethics 6(2): 199.
Ross SA (1973) The Principal’s Problem. The American Economic Review 63(2).
