From Explanation to Accountability
A decade of explainable AI research has produced important techniques for understanding AI models, but less clarity on who those explanations are for and what accountability goals they actually serve
In the event of an incident where an AI system causes harm, the people responsible for that system may be expected to render an explanation to various accountability forums, such as the media, administrative bodies, or in a courtroom. This relationship between an actor and a forum where the actor is obliged to explain their conduct to the forum is a key aspect of accountability.
There is an array of research on the topic of explainable AI (i.e. XAI) extending back for nearly a decade (Lipton, 2018; Mittelstadt et al, 2018; Wachter et al, 2017), with a focus on making the technical components of AI systems more understandable. Explanation in this context has been defined as “the ability to articulate why a model produced a given output in a way that is accessible to human users” (Dhar et al, 2025). But recent research (Dhar et al, 2025; Alpsancar et al, 2025) has raised the critique that the literature on XAI hasn’t always been clear about the goals of AI explanations. Who and what are they for, really? If explanations are meant to support accountability, there is still somewhat of a gap in the literature showing exactly how, particularly if we parse out the different goals of retrospective vs. prospective accountability.
Dhar et al present a framework for thinking about the goals of AI explanation in terms of who explanations are designed for, what information is conveyed, and how an explanation presents that information (2025). Different stakeholders such as AI system developers, operators, validators, and subjects may have different needs for explanations, including different modalities of presentation (e.g. visual, textual, interactive) that match information needs. Each stakeholder here might need something a bit different from an AI system explanation in order to contribute to accountability. For instance, a system developer might benefit from a highly technical interactive explanation that helps them debug an issue with the model and so help prevent future bias or fairness issues in decisions. A decision-subject might need something more accessible to help them understand why they got the outcome they did and potentially contest it if they think it’s wrong. A validator (e.g. an auditor) may need to verify input features used by the model to ensure they are accurate and appropriate. And an operator needs to be informed about how their actions lead to probable consequences with the system in order to be a responsible human-in-the-loop (Baum et al, 2022).
Besides who explanations are for, there are important dimensions about what should be explained (Dhar et al, 2025). Local explanations focus on individual outputs and are well-aligned to the goals of retrospective accountability, which focus on identifying and assigning blame for a specific individual decision. On the other hand global explanations, which orient towards overall patterns of output from a model across a range of inputs, are better suited to supporting goals of prospective accountability where a birds-eye view is needed to inform how to prevent anticipated harms at the system level. Post-hoc explanations of system behavior which track how inputs influence outputs are the key for both retrospective and prospective accountability, while mechanistic explanations that trace functional model internals are more narrowly useful for informing developers towards preventing unintended outcomes. In other words, while the classic view of accountability as retrospective doesn’t hinge on explaining model internals, a prospective view could additionally benefit from explanations of those internals to debug model failures and ensure better outcomes in the future.
An early paper to make a connection between AI explanation and the goals of accountability comes from Doshi-Velez et al (2017). The authors appropriately point out the potential for explanations to prevent or rectify errors in AI systems, helping to discern the appropriate or inappropriate use of criteria by a system. They note two key types of explanation that can play an important role in supporting accountability: feature importance and counterfactuals.
Feature importance/relevance explanations provide information about the weighting and priority of inputs to specific outputs, or to the overall distribution of outputs. Should some features be unacceptably correlated to outputs (e.g. race) overall, this can inform prospective accountability so that the model or its training data can be rectified. If features are correlated to outputs in ways that contradict a scientific causal account of what should be predictive of outcomes, this could be grounds for prospective accountability to align the model with scientific expectations. If the scientific account is well-established such a contradiction could also contribute to retrospective accountability for negligence.
Counterfactual explanations include a “statement of how the world would have to be different for a desirable outcome to occur” and “describe a dependency on the external facts that led to that decision” (Wachter et al, 2017) and have also been framed as a form of feature relevance explanation (Speith, 2022). If a decision-subject had their mortgage application denied and the counterfactual explanation indicated that they would have been approved if their race had been different, that would be clear grounds for that individual to contest the output.
While Doshi-Velez got it mostly right when it comes to supporting retrospective accountability, another explanation type elaborated in the literature (Speith, 2022)—model surrogates (e.g. linear approximations)—may also be narrowly useful for prospective accountability. What a model surrogate explanation can offer is a clear and interpretable feature importance explanation of a more complex model (e.g. a neural net, or other black-box model). If that feature importance explanation indicates an inappropriate bias this could be grounds for a developer to be prospectively responsible for addressing the apparent behavioral bias of their model. Even if the model itself doesn’t use inappropriate data, if its behavior appears to be inappropriate that might be grounds to call for it to change. Where model surrogates are not so useful is for retrospective accountability as they don’t reflect the actual decision-logic impacting a specific individual.
A recent paper from Alpsancar et al (2025) makes an explicit connection between AI explanation and the needs of assigning responsibility to support AI governance. The authors recount the classical model of moral responsibility which hinges on fulfilling three criteria to hold someone responsible for their actions: (1) causality (i.e. the person influenced the outcome), (2) freedom (i.e. the person was not coerced in their action), and (3) epistemic (i.e. the person is aware of the consequences of their actions). The authors also review what they term the trans-classical model of responsibility which is a systemic view of responsibility that helps cope with unintended and unforeseen consequences. In this view, the epistemic condition instead relates to knowledge of the potential for and probability of various outcomes in the system (i.e. risk) and responsibility is assigned for managing that risk.
In the classical view, the goal of explanation for accountability is clear: to help fulfill the three conditions so that responsibility can be assessed. AI system explanations should indicate causality, including who (or what) took what actions that were critical to the outcome. They should indicate the autonomy of entities and their actions, including how individuals may be influenced by AI systems in their judgements. And they should show whether individuals in the system were appropriately informed about the consequences of their actions. On the other hand, in the trans-classical view the goal of explanation should be to support the understanding of the risk (i.e. severity and prevalence) of outcomes. But it could also be important for explanations to show that there is not a direct causal actor responsible in the system, since otherwise we might revert to the classical model. Regardless of view there is a need for a sociotechnical approach to AI explanation. Explanations of technical models as discussed above are important for supporting the knowledge needs for either view.
There are several policy-relevant implications that can be derived here. First, explanation requirements for AI systems should specify the audience for the explanation. A disclosure rule that works for a decision-subject contesting a denied loan looks very different from one aimed at auditors verifying model inputs or developers debugging bias. Second, any explanation requirements should tie back to the accountability purpose being served. Retrospective accountability calls for local, post-hoc explanations including counterfactuals and feature importance explanations, while prospective accountability calls for global explanations about patterns across outputs. Third, policymakers should consider both the classical and trans-classical view of responsibility and how and whether they may want to blend or distinguish the two in assigning responsibility. Finally, standards bodies should resist technical definitions of explainability and consider sociotechnical elements related to the human use of AI systems and their explanations.
References
Alpsancar S, Buhl HM, Matzner T, et al. (2025) Explanation needs and ethical demands: unpacking the instrumental value of XAI. AI and Ethics 5(3): 3015–3033.
Baum, K., Mantel, S., Schmidt, E. & Speith, T. From Responsibility to Reason-Giving Explainable Artificial Intelligence. Philos. Technol. 35, 12 (2022).
Dhar R, Brandl S, Oldenburg N, et al. (2025) Beyond Technocratic XAI: The Who, What & How in Explanation Design. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(1): 745–759.
Doshi-Velez F, Kortz M, Budish R, et al. (2017) Accountability of AI Under the Law: The Role of Explanation. arXiv. DOI: 10.48550/arxiv.1711.01134.
Lipton, Z. C. 2018. The mythos of model interpretability:In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3): 31–57
Mittelstadt B, Russell C and Wachter S (2019) Explaining Explanations in AI. Proceedings of the Conference on Fairness, Accountability, and Transparency: 279–288.
Speith T (2022) A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods. 2022 ACM Conference on Fairness Accountability and Transparency: 2239–2250.
Wachter, S.; Mittelstadt, B.; and Russell, C. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841
