AI Accountability Review

Designing an AI Whistleblower Office

Nick Diakopoulos — Mon, 27 Apr 2026 06:01:03 GMT

One of the recurring puzzles for AI governance is how regulators will ever learn about noncompliance inside firms whose behavior is difficult to observe from the outside. A new empirical report published on arXiv by Beri and Baker (2026) argues that a dedicated whistleblower office could be a “force multiplier” for AI regulation, and offers a set of concrete design recommendations grounded in a dataset of 30 historical whistleblower case studies spanning 1978–2020 across 15 industries.

From the 30 cases analyzed the authors report that in about 87% of cases the whistleblowers were motivated, at least in part, by moral considerations, with 27% indicating some kind of financial motivation. At least 90% were insiders at the offending organization and roughly 80% were mid-level employees or executives. But stepping forward was costly: 57–67% faced retaliation (e.g. harassment or unjust termination), 43–57% suffered negative career consequences, and 13% received death threats. Only 13% sought anonymity—although the authors caution that this likely reflects sampling bias toward famous cases. In short, whistleblowers in the dataset tended to be morally motivated insiders who paid a steep personal price.

Based on these patterns and their own observations the authors develop a design sketch for an AI whistleblower office. They claim it should: (1) financially reward tipsters with a percentage of sanctions, in the spirit of the SEC and CFTC programs, given that this can be a motivator ; (2) prohibit retaliation and offer witness protection (plus S visas for international tipsters); (3) enable anonymous tipping via lawyers or a secure online platform; (4) be adequately staffed and funded for effective “tip-sifting”; and (5) invest in messaging to raise awareness for the office and an advisory body to help would-be whistleblowers determine whether they have reasonable cause.

For AI accountability, this work adds a new dimension to transparency. Mandated disclosures and external audits will always leave gaps and insider reporting is one of the few channels likely to surface willful concealment. The recommendations align with a prospective accountability frame: supporting protection, anonymity, and an advice body for potential whistleblowers are forward-looking responsibilities that might make insider reporting a more viable option before harms have occurred. A sample of 30 is small, and the cases skew U.S., famous, and successfully-tipped—but as a starting point for thinking about policy the empirical grounding is valuable.

Note: This post was drafted by Claude Opus 4.7 under the prompting, supervision, and further editing by the author.

From Explanation to Accountability

Nick Diakopoulos — Mon, 20 Apr 2026 06:01:26 GMT

In the event of an incident where an AI system causes harm, the people responsible for that system may be expected to render an explanation to various accountability forums, such as the media, administrative bodies, or in a courtroom. This relationship between an actor and a forum where the actor is obliged to explain their conduct to the forum is a key aspect of accountability.

There is an array of research on the topic of explainable AI (i.e. XAI) extending back for nearly a decade (Lipton, 2018; Mittelstadt et al, 2018; Wachter et al, 2017), with a focus on making the technical components of AI systems more understandable. Explanation in this context has been defined as “the ability to articulate why a model produced a given output in a way that is accessible to human users” (Dhar et al, 2025). But recent research (Dhar et al, 2025; Alpsancar et al, 2025) has raised the critique that the literature on XAI hasn’t always been clear about the goals of AI explanations. Who and what are they for, really? If explanations are meant to support accountability, there is still somewhat of a gap in the literature showing exactly how, particularly if we parse out the different goals of retrospective vs. prospective accountability.

Dhar et al present a framework for thinking about the goals of AI explanation in terms of who explanations are designed for, what information is conveyed, and how an explanation presents that information (2025). Different stakeholders such as AI system developers, operators, validators, and subjects may have different needs for explanations, including different modalities of presentation (e.g. visual, textual, interactive) that match information needs. Each stakeholder here might need something a bit different from an AI system explanation in order to contribute to accountability. For instance, a system developer might benefit from a highly technical interactive explanation that helps them debug an issue with the model and so help prevent future bias or fairness issues in decisions. A decision-subject might need something more accessible to help them understand why they got the outcome they did and potentially contest it if they think it’s wrong. A validator (e.g. an auditor) may need to verify input features used by the model to ensure they are accurate and appropriate. And an operator needs to be informed about how their actions lead to probable consequences with the system in order to be a responsible human-in-the-loop (Baum et al, 2022).

Besides who explanations are for, there are important dimensions about what should be explained (Dhar et al, 2025). Local explanations focus on individual outputs and are well-aligned to the goals of retrospective accountability, which focus on identifying and assigning blame for a specific individual decision. On the other hand global explanations, which orient towards overall patterns of output from a model across a range of inputs, are better suited to supporting goals of prospective accountability where a birds-eye view is needed to inform how to prevent anticipated harms at the system level. Post-hoc explanations of system behavior which track how inputs influence outputs are the key for both retrospective and prospective accountability, while mechanistic explanations that trace functional model internals are more narrowly useful for informing developers towards preventing unintended outcomes. In other words, while the classic view of accountability as retrospective doesn’t hinge on explaining model internals, a prospective view could additionally benefit from explanations of those internals to debug model failures and ensure better outcomes in the future.

An early paper to make a connection between AI explanation and the goals of accountability comes from Doshi-Velez et al (2017). The authors appropriately point out the potential for explanations to prevent or rectify errors in AI systems, helping to discern the appropriate or inappropriate use of criteria by a system. They note two key types of explanation that can play an important role in supporting accountability: feature importance and counterfactuals.

Thanks for reading AI Accountability Review! This post is public so feel free to share it.

Feature importance/relevance explanations provide information about the weighting and priority of inputs to specific outputs, or to the overall distribution of outputs. Should some features be unacceptably correlated to outputs (e.g. race) overall, this can inform prospective accountability so that the model or its training data can be rectified. If features are correlated to outputs in ways that contradict a scientific causal account of what should be predictive of outcomes, this could be grounds for prospective accountability to align the model with scientific expectations. If the scientific account is well-established such a contradiction could also contribute to retrospective accountability for negligence.

Counterfactual explanations include a “statement of how the world would have to be different for a desirable outcome to occur” and “describe a dependency on the external facts that led to that decision” (Wachter et al, 2017) and have also been framed as a form of feature relevance explanation (Speith, 2022). If a decision-subject had their mortgage application denied and the counterfactual explanation indicated that they would have been approved if their race had been different, that would be clear grounds for that individual to contest the output.

While Doshi-Velez got it mostly right when it comes to supporting retrospective accountability, another explanation type elaborated in the literature (Speith, 2022)—model surrogates (e.g. linear approximations)—may also be narrowly useful for prospective accountability. What a model surrogate explanation can offer is a clear and interpretable feature importance explanation of a more complex model (e.g. a neural net, or other black-box model). If that feature importance explanation indicates an inappropriate bias this could be grounds for a developer to be prospectively responsible for addressing the apparent behavioral bias of their model. Even if the model itself doesn’t use inappropriate data, if its behavior appears to be inappropriate that might be grounds to call for it to change. Where model surrogates are not so useful is for retrospective accountability as they don’t reflect the actual decision-logic impacting a specific individual.

A recent paper from Alpsancar et al (2025) makes an explicit connection between AI explanation and the needs of assigning responsibility to support AI governance. The authors recount the classical model of moral responsibility which hinges on fulfilling three criteria to hold someone responsible for their actions: (1) causality (i.e. the person influenced the outcome), (2) freedom (i.e. the person was not coerced in their action), and (3) epistemic (i.e. the person is aware of the consequences of their actions). The authors also review what they term the trans-classical model of responsibility which is a systemic view of responsibility that helps cope with unintended and unforeseen consequences. In this view, the epistemic condition instead relates to knowledge of the potential for and probability of various outcomes in the system (i.e. risk) and responsibility is assigned for managing that risk.

In the classical view, the goal of explanation for accountability is clear: to help fulfill the three conditions so that responsibility can be assessed. AI system explanations should indicate causality, including who (or what) took what actions that were critical to the outcome. They should indicate the autonomy of entities and their actions, including how individuals may be influenced by AI systems in their judgements. And they should show whether individuals in the system were appropriately informed about the consequences of their actions. On the other hand, in the trans-classical view the goal of explanation should be to support the understanding of the risk (i.e. severity and prevalence) of outcomes. But it could also be important for explanations to show that there is not a direct causal actor responsible in the system, since otherwise we might revert to the classical model. Regardless of view there is a need for a sociotechnical approach to AI explanation. Explanations of technical models as discussed above are important for supporting the knowledge needs for either view.

There are several policy-relevant implications that can be derived here. First, explanation requirements for AI systems should specify the audience for the explanation. A disclosure rule that works for a decision-subject contesting a denied loan looks very different from one aimed at auditors verifying model inputs or developers debugging bias. Second, any explanation requirements should tie back to the accountability purpose being served. Retrospective accountability calls for local, post-hoc explanations including counterfactuals and feature importance explanations, while prospective accountability calls for global explanations about patterns across outputs. Third, policymakers should consider both the classical and trans-classical view of responsibility and how and whether they may want to blend or distinguish the two in assigning responsibility. Finally, standards bodies should resist technical definitions of explainability and consider sociotechnical elements related to the human use of AI systems and their explanations.

References

Alpsancar S, Buhl HM, Matzner T, et al. (2025) Explanation needs and ethical demands: unpacking the instrumental value of XAI. AI and Ethics 5(3): 3015–3033.

Baum, K., Mantel, S., Schmidt, E. & Speith, T. From Responsibility to Reason-Giving Explainable Artificial Intelligence. Philos. Technol. 35, 12 (2022).

Dhar R, Brandl S, Oldenburg N, et al. (2025) Beyond Technocratic XAI: The Who, What & How in Explanation Design. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(1): 745–759.

Doshi-Velez F, Kortz M, Budish R, et al. (2017) Accountability of AI Under the Law: The Role of Explanation. arXiv. DOI: 10.48550/arxiv.1711.01134.

Lipton, Z. C. 2018. The mythos of model interpretability:In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3): 31–57

Mittelstadt B, Russell C and Wachter S (2019) Explaining Explanations in AI. Proceedings of the Conference on Fairness, Accountability, and Transparency: 279–288.

Speith T (2022) A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods. 2022 ACM Conference on Fairness Accountability and Transparency: 2239–2250.

Wachter, S.; Mittelstadt, B.; and Russell, C. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841

LLMs Can’t Provide Faithful Explanations Needed for AI Accountability

Nick Diakopoulos — Tue, 24 Mar 2026 12:17:01 GMT

A growing array of research points out that the explanations produced by LLMs are not accurate. In the literature this is referred to as explanation faithfulness (Agarwal et al, 2024; Jacovi and Goldberg, 2020) and accurately measuring it is an area of active research (Lyu et al, 2024). Agarwal and colleagues (2024) articulate it as: “An explanation is considered faithful if it accurately represents the reasoning of the underlying model.” A less anthropomorphic way of talking about “reasoning” here would be to say that an explanation is faithful if it accurately describes how the system or model processes an input into an output. Some explanations may be more faithful than others (Jacovi and Goldberg, 2020), with certain interpretable models able to produce more faithful explanations than black-box models (Rudin, 2019).

Explanations rendered by and about AI systems need to be as faithful as epistemically possible in order to support accountability. Buijsman describes the role of explanation in supporting accountability: “when a mistake has been made, the challenge is to find a reason why that mistake happened and the people responsible for fixing it.” (2026). A faithful explanation might help understand whether there may be an issue with faulty data, missing information, or incorrect reasoning, and ultimately help improve the system over time. Explanations that are not faithful could misdirect decision-making about how to assign blame or prevent future harms, frustrate attempts to contest a decision or diagnose mistakes and logical errors so they can be corrected, and ultimately to appropriately sanction actors if the explanation is unacceptable.

Faithfulness is especially relevant to questions of process accountability, where the goal is to hold an actor in the AI system accountable for how an outcome was computed. Explanations are a diagnostic tool for accountability, describing how inputs lead to the outcome and helping to trace instances of potential negligence or faulty logic in the system. If an unfaithful explanation of a mortgage decision says that you were rejected because your income is too low but the model decision was actually influenced by your race or zip code this undermines your ability to challenge the decision as unacceptably including protected characteristics.

LLMs are not able to provide faithful explanations, such as self-explanations generated by the model to render the “reasoning” behind their output in human-understandable language (Madsen et al, 2024; Mayne et al, 2025; Mutton et al, 2025). Madsen and colleagues (2024) show that larger models with more parameters generally produce more faithful explanations but that there is high variance across tasks. Mayne and colleagues (2025) focus on self-generated counterfactual explanations (SCEs) and indicate that their findings “suggest that SCEs are, at best, an ineffective explainability tool and, at worst, can provide misleading insights into model behaviour.” While models may be able to provide counterfactual explanations (e.g. if you change variables X and Y it will flip the decision outcome), these may be trivially true rather than articulating minimal changes to the input that would actually shed light on the decision.

The main implication here is that when accountability matters, such as for high-stakes situations where there is potential for severe impacts, faithful explanations are critical, but LLMs cannot provide such explanations. Policymakers may consider when AI providers need to demonstrate faithfulness of model explanations and establish thresholds around when models can be used in high-stakes contexts. Administrative bodies will also need to develop standardized benchmarks and measurements for faithfulness to support such policies.

References

Agarwal C, Tanneru SH and Lakkaraju H (2024) Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models. arXiv. DOI: 10.48550/arxiv.2402.04614.

Buijsman S (2026) Accuracy is not all you need! The Reasons to Require AI Explainability. Minds and Machines 36(1): 14.

Jacovi, A. & Goldberg, Y. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? Proc. 58th Annu. Meet. Assoc. Comput. Linguistics 4198–4205 (2020) doi:10.18653/v1/2020.acl-main.386.

Lyu, Q., Apidianaki, M. & Callison-Burch, C. Towards Faithful Model Explanation in NLP: A Survey. Computational Linguistics 50, 657–723 (2024).

Madsen A, Chandar S and Reddy S (2024) Are self-explanations from Large Language Models faithful? In: Findings of the Association for Computational Linguistics: ACL, 2024.

Mayne H, Kearns RO, Yang Y, et al. (2025) LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations. In: EMNLP, 2025.

Matton K, Ness RO, Guttag J, et al. (2025) Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations. In: ICLR, 2025.

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

Experimenting with AI in a Living Literature Review

Nick Diakopoulos — Mon, 09 Feb 2026 11:02:45 GMT

The AI Accountability Review (AIAR) is a living literature review with the goal of tracking literature on the topic of AI accountability for an audience of researchers and policymakers. I write posts that either focus on translating a single piece of literature or that synthesize several pieces of literature towards policy implications. How could AI help with this process?

I recently came across a paper by Fok et al (2025) that’s been useful in helping me organize the various AI experiments I’ve been trying. Based on interviews with researchers who have written literature reviews the paper helps to understand their overall process and some of the ways they conceptualize the use of AI in that process.

The findings articulate a set of four phases to the literature review process that participants engaged in: search, appraisal, synthesis, and interpretation. The paper also identifies some of the ways AI can support updating of reviews, namely through automation and in providing a second opinion. My own use of AI for AIAR has been most useful for automating (with oversight) some of the appraisal aspects of the process, and in providing second opinions on appraisal and synthesis. I have also dabbled in some use cases that more directly do synthesis and interpretation, but these have been less successful. And I haven’t really tried any AI use cases for search, because I think that setting the search scope for the review is something that needs to be closely managed by me. Let’s walk through some of the different things I’ve tried.

Scraping, Formatting, and Promotion

Probably the most time-saving use case I’ve found is to use OpenAI’s Agent mode to help collect conference proceedings papers that I want to review. Some conferences have non-standard presentations of information, but Agent mode is pretty adept at navigating websites to collect papers and format them as RSS feeds. I plug those feeds into my triage workflow on InoReader, which streamlines the appraisal process of papers. It can help to be explicit in the prompt and identify a structured data (e.g. JSON) version of the proceedings. And while this process is mostly automated I find that I do still need to double-check the outputs to make sure it was a comprehensive scrape.

I have also experimented with using Agent mode to gather email addresses for each of the primary authors of papers cited by one of my posts, and to then draft a short personalized note notifying the person about the post. I wasn’t intrepid enough to automate the actual emailing, but I did manually copy and send some of the emails (after light editing) and even got a response from one. Promoting AIAR on social media could be a full time job, but having the AI do some of the grunt work of getting email addresses and drafting emails lowers the barrier a bit.

Article Appraisal

One of the nice built-in features of InoReader is that for any item in a feed I am tracking, I can trigger a custom prompt to an LLM. Using this feature I can get a quick second opinion from the LLM on whether the item might be relevant to my audience. Admittedly I don’t use this all that often, but I do occasionally engage it. One of the issues is that not all the RSS feeds I follow have full abstract text and so this limits the applicability. I do think there’s real potential in having AI help think through what items have implications for your intended audience, and there’s probably a lot more sophistication that could be applied in how to do this computationally beyond the integrated prompting in InoReader, such as by simulating ideal audience members and what they would want to know about an item.

Some articles on AIAR reflect the synthesis of a cluster of literature. As a living literature review the goal is to update these over time with other literature relevant to the cluster. I’ve been experimenting with LLMs to support this process. Using a Google Colab notebook I input the URL of the base article to be updated and scrape the full text. Then I prompt an LLM to evaluate a stream of literature for relevance to that article. The prompt is critical here. What I’m looking for are new papers that might directly update, change, or provide new context to any of the claims in the original article, to find new papers that might actually make a difference.

Each paper is rated for relevance, and that rating is paired with a table listing claims from the original article and ideas from the new paper that might bear on those claims. The table facilitates my appraisal of the new paper. The output looks like this:

So far this is promising, but there’s still work to do to evaluate it and set it up as an ongoing monitoring process that fully integrates with my InoReader appraisal workflow. In principle I’d set this up for each of the base articles in AIAR, and then monitor literature from something like OpenAlex to create a continuously updated feed of potentially relevant papers.

Grounded Synthesis

Google’s NotebookLM has turned into an increasingly powerful tool that can be used to interactively synthesize curations of articles. For my article on AI Ethics Principles and Accountability, I even published a notebook with all of the sources I had used to write the article. While the original goal with creating the notebook here was to allow readers to interactively explore the literature, I also realized that I could also use this to provide a second opinion on my own synthesis. Using Gemini, you can refer to a notebook of curated sources in NotebookLM and so I prompted it to create a table listing the supporting evidence for every claim in the post. In the absence of an editor, this can be a useful double-check to make sure you’re staying honest to the underlying literature in your synthesis. I think this kind of approach could potentially also be useful in an article update process to assess whether claims in new papers support or refute the existing claims you’ve written.

Still, I am a bit cautious about relying on LLMs, even closely grounded ones, in helping to synthesize literature for AIAR. In an early experiment, I loaded up NotebookLM with the entirety of the Fairness, Accountability, and Transparency Conference proceedings from 2025. I asked Gemini (with access to the Notebook) to look for clusters of papers that were thematically related to each other and to the topic of the blog. While some of these clusters seemed relevant and overlapped with my own perception of themes, others seemed more tenuous in the solidity of the theme and its relevance to AIAR. Synthesis is to a large degree about framing and finding a consistent thread, and I don’t think even the best LLMs are able to do this in a way that is satisfying.

As a Writing Aid

I have attempted to use LLMs (primarily Gemini, sometimes directly in NotebookLM) to help draft five of the posts for AIAR, three of which were based on translating a single paper, and two of which were based on clusters of papers.

I found that for the articles based on clusters the LLM was wholly unsuited to the task of synthesis: I ended up using none of the generated text. Even including all of the paper texts and my notes on those papers in the prompt, I was left feeling that the synthesized text didn’t capture what was interesting or important about the cluster. This again goes back to the idea of framing, structuring, and finding the aspects of relevance that I think are important within the field and to my intended audience. But this also relates to the interpretation phase and the “identification of key challenges, future trends, and open research opportunities” (Fok et al, 2025). All of this is consistent with what some editors at Science found when they tried to use ChatGPT to translate research papers.

For the three articles that were more direct translations of individual research papers I had slightly more success with incorporating AI generated text. In this post, I used almost 50% of the generated text in the final piece, which warranted a disclosure at the bottom of the post: “Some text in this post was adapted based on suggestions from AI.” I think this was somewhat successful because I prompted the model with details on the aspects of the paper I wanted the post to focus on, and that the post itself was more descriptive than synthetic or interpretive. The parts of the post that I wrote were the more interpretive aspects, putting the research into a broader context and considering its relevance to the audience. In another post (excepted below), I was also able to use some chunks of descriptive text that were generated by the LLM.

In Closing

Much like everything else on AIAR, this post will be a work-in-progress and is subject to update. The most compelling use-case I’ve found so far for AI is in automating the collection and formatting of references into my RSS workflow as this lets me do something that I might not otherwise make time for. I also find the article appraisal workflow compelling and plan to keep pushing on that to integrate it more into my regular workflow for keeping AIAR posts updated. I may also revisit use cases related to grounded synthesis and writing though I’m generally less optimistic about AI providing a real lift there. The work of framing and making connections in the literature, contextualizing findings, and thinking about what matters to an audience seem like they really need an expert eye, though perhaps LLMs can assist by offering a second opinion.

References

Fok R, Siu A and Weld DS (2025) Toward Living Narrative Reviews: An Empirical Study of the Processes and Challenges in Updating Survey Articles in Computing Research. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems: 1–10.

A Critique of Transparency Provisions in NY’s RAISE Act (1.0)

Nick Diakopoulos — Mon, 26 Jan 2026 11:02:34 GMT

The US has a couple of state laws—from California, and now New York—that address the risk of frontier AI models. Both broadly operate by specifying some information about frontier AI models that must be disclosed for the purposes of oversight. In this post I’ll review New York’s “Responsible AI Safety and Education” (RAISE) act through the lens of the quality of transparency information called for in the law. (Note: I examine the act as signed into law, and will likely write another post when the chapter amendments proposed by Governor Hochul are passed by the NY legislature, likely in early 2026).

Probably the biggest issue I see with the law is in the definitions. The RAISE act is geared towards regulating “frontier models” which it defines as: “an artificial intelligence model trained using greater than 10º26 computational operations (e.g., integer or floating-point operations), the compute cost of which exceeds one hundred million dollars” (or a different definition that applies to models produced through knowledge distillation). This is a bad definition because it merges two criteria that are arbitrary, shifting over time, and, most importantly, which model developers are not required to disclose. They are arbitrary because there’s no reason to think that 10^26 computing operations is a magical threshold at which danger suddenly materializes. Based on estimates from Epoch, none of the current breed of frontier models surpasses this threshold, so it’s not clear that the law applies to anything in the real world. And, again, even if OpenAI, Google, or Anthropic has exceeded this threshold in its training there’s no way for us to know because the law doesn’t make them tell anyone. It’s a sort of scouts honor, opt-in system. Also, the definition of compute is in conjunction with a cost greater than $100 million. Because the definition has to match both criteria, a model could use more than the compute threshold but be done for less than $100 million and then the law wouldn’t apply. But compute costs are always getting cheaper, and some model developers like Google control the market price of their computing and so can game this, not to mention that the value of the dollar could change.

The main mechanism for specifying transparency in the law is that large developers of models create a “safety and security protocol” —a form of transparency report—before deployment of the model. There are also provisions to require the reporting of “safety incidents” that might be a case of or increased risk of critical harm. The protocol report is shared with administrative accountability forums such as the attorney general and division of homeland security, as well as being made publicly available in redacted form for media or social forums. The unredacted protocol plus additional information about the tests and test results that inform the protocol need to be maintained for however long the model is deployed plus five years, presumably so that those records are potentially available for discovery by legal forums in the event they are needed. In this sense, the law does pretty well in providing for accessibility of the safety and security protocol to various accountability forums.

The overall aim of the law towards “frontier models” and “critical harm” scopes and sets limits on the relevance of the information in the protocol. Critical harm is defined as causing $1 billion or more in damage or loss of 100+ human lives. But with that scope in mind the definition of the protocol is reasonable as it specifies what should be included, including organizational procedures and sociotechnical measures meant to mitigate the potential for critical harm, as well as the testing procedures used to “evaluate if the frontier model poses an unreasonable risk of critical harm”. The protocol must also designate a person that is responsible for compliance — this is a critical component that ensures accountability for overseeing the protocol. The timeliness of the report is also referenced and calls for the developer to update the protocol on an annual basis as per any changes.

An area where the protocol falls short is in either specifying or auditing the accuracy of the information in the protocol. An earlier version of the law had provisions requiring 3rd party auditing, but those were removed from the final version signed into law. That would have strengthened the law considerably by having an independent entity checking the validity of procedures and the accuracy of provided information in the protocol. What’s left is the comparatively weaker request that large developers not lie, i.e. “shall not knowingly make false or materially misleading statements or omissions.” We can’t really assess whether the information in protocols would be understandable and fit for the purpose of accountability. A stronger law would have created a standard for the protocol that would be considered adequate.

The law provides reasonable carve outs to address typical criticisms and stakeholder pushback about transparency, including that disclosures might undermine privacy, confidentiality, trade secrets, or be used to game the system. Redactions to public safety and security protocols can be undertaken to protect these other interests. The law also protects fundamental innovation by not applying to academic research done at accredited colleges and universities. In addressing the tensions between transparency and other interests at stake, the law probably does about as well as it could, especially because administrative forums like the attorney general can gain access to copies of the protocol that are less redacted, i.e. where redactions only need to respect federal law, and fully unredacted reports must be maintained for possible discovery in legal forums.

Overall, much like its Californian counterpart, New York’s RAISE act is geared towards prospective accountability — trying to prevent future harm. Its scope is narrow around “critical harms”. While it does well to specify the accessibility of the transparency information it calls for, and align that information so it is relevant and timely to its scope, it lacks provisions for ensuring the accuracy of the information, and leaves the understandability of that information up to the large developers who’ll be creating the reports. But it’s not a powerful law because it doesn’t apply to anything in the real world (yet), and it’s unclear whether model developers will ever raise their hand and say that the law actually applies to them. It does provide an example of AI governance through transparency that can inform future legislation. The next version of the law, proposed by the governor’s office and under consideration by the state legislature, is already drastically different in many ways.

Closing Information Gaps via AI Transparency

Nick Diakopoulos — Mon, 05 Jan 2026 15:15:24 GMT

Before anyone can be held accountable for an AI system’s behavior we’re going to need some information about that system. What was the system’s behavior and was its performance unexpected? What are the underlying values and goals of its designers? Did the developers take appropriate steps to test for and prevent harmful outcomes? How are organizational policies designed and implemented for the ongoing operation of the system? Transparency is the umbrella idea of closing these kinds of knowledge gaps, and should be differentiated from explanation which is a more specific approach (Corbett and Denton, 2025; Hayes et al, 2023). More formally, transparency can be defined as “the availability of information about an actor allowing other actors to monitor the workings or performance of this actor” (Meijer et al, 2014). And while transparency in itself cannot ensure accountability, it often plays a critical supporting role, providing the informational substrate for understanding AI system behavior that can then filter into various forums that might seek to hold actors in an AI system accountable.

Transparency sets up a relationship between two entities—here an AI system and a forum—where information about the AI system becomes available to the forum. Because AI systems are sociotechnical this includes information about both the data and technical model in the system, as well as the human components such as organizational policies, procedures or practices, and user behaviors (Diakopoulos, 2020). For the sake of accountability, provided information should help the forum determine congruence with relevant values, goals, and normative or legal expectations of behavior (Hayes et al, 2023; Fleischmann and Wallace, 2005). Transparency information can be voluntary (e.g. a blog post), obligatory (e.g. legally mandated disclosure to an administrator), or involuntary (e.g. external audits, or leaks), though recent research has underscored the inadequacy of volunteered “first-party” transparency information compared to external “third-party” evaluations of social impacts (Reuel et al, 2025).

To be useful for accountability, transparency information needs to reflect high information quality. At a minimum it needs to be accessible, understandable, relevant, and accurate (Hayes et al, 2023; Diakopoulos, 2020; Turilli and Floridi, 2009). Beyond just availability, information needs to be accessible so that it can be easily found by audiences such as various accountability forums. It also needs to be understandable or usable by those audiences and aligned to their information processing capabilities and capacities. It needs to be relevant to diagnosing some behavior of interest whether that be in shedding light on some negative outcome for retrospective accountability, or providing critical context to inform prospective accountability. Information also needs to be accurate such that it is valid, reliable, and free of error (Turilli and Floridi, 2009), since otherwise it can suffer from strategic activities that shape or distort information, leading to uninformative or boilerplate disclosures (Marin et al, 2025). Other aspects of information quality that are pertinent include the currency or timeliness of the information, and its comprehensiveness. AI transparency will typically fall short when the above factors and attributes aren’t adequately addressed.

A reoccurring pattern we see in the literature is a failure to clearly articulate the intended audience or forum for transparency information, with implications for how the information would be maximally accessible, understandable, and relevant for that audience. For instance, in the 2025 Foundation Model Transparency Index (Wan et al, 2025), the authors establish a set of 100 indicators that they apply to various models to evaluate how transparent they are in terms of data, training, compute usage, modeling, and downstream impacts and use policies. But the audience for all of this information—and its utility for accountability—is anything but clear. What transparency initiatives like this one need to do is clearly articulate the public interest and accountability purpose of each indicator, helping to connect over to the audience or forum that would then use that information for accountability. Similarly, a recent proposal for AI agent transparency (Ezell et al, 2025) appears oriented somewhat towards technical developers “debugging” agent incidents. If the information in that framework could be made available to administrative or judicial forums, it’s likely they would benefit from at least some of the information. But the ideal would be a more parsimonious framework that more closely tracks the needs of those forums for specific issues they may need to assess for accountability.

While I would argue that transparency is a necessary pre-condition for accountability, critics point out that transparency is not an unalloyed positive force. It shouldn’t be assumed to always enable accountability (Corbett and Denton, 2023), though policies that shape adherence to the attributes of quality transparency information described above should increase the likelihood of its utility. Transparency can also come into tension with other values, such as privacy, freedom of expression, or intellectual property (Ananny and Crawford, 2018; Diakopoulos, 2020; Turilli and Floridi, 2009) leading to situations where tradeoffs need to be made in highly context-specific ways. One of the most frequent counter arguments to more transparency is that it could enable gaming or manipulation of the system (van Bekkum and Borgesius, 2021), though careful context-specific engineering, threat modeling, and consideration to forum-specific access provisions should alleviate this issue (Diakopoulos, 2020). We might also consider the idea that social forums may use manipulation as a way to sanction a system—in other words manipulating a system may in some contexts and situations be considered a component of holding a system accountable for unwanted behavior. Ultimately, the choices around what, when, and how AI systems are made transparent are political (Corbett and Denton, 2023).

The role of policy here is to thread the needle through these criticisms to scope transparency and shape it towards positive outcomes for society. Policy must create obligations for actors within AI systems to produce the information needed by any given forum (e.g. administrative, legal, etc.) to make the relevant assessment of system performance. This information needs to meet accessibility, understandability, relevance, accuracy, currency, and comprehensiveness quality criteria. One way to do this is to be more specific about standards for AI system transparency information production: what standard processes and practices should be evidenced by actors making transparency information available? Public sector policy makers cannot leave this unspecified, otherwise there is too much room for strategic and performative behavior. Another role for policy makers is to engage in the politics of where and how to make tradeoffs with other values such as privacy; looking to public attitudes should probably inform this. Transparency policies need to be user-centered (e.g. towards whatever forum the information is intended for) and context-specific, and would benefit from human-centered engineering and evaluation to refine their scope, meet user needs, and maximize their utility for accountability.

References

Ananny M and Crawford K (2018) Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society 20(3): 973–989.

Corbett E and Denton R (2023) Interrogating the T in FAccT. Conference on Fairness, Accountability, and Transparency: 1624–1634.

Diakopoulos N. (2020) Transparency. Oxford Handbook of Ethics and AI. Eds. Markus Dubber, Frank Pasquale, Sunit Das.

Ezell C, Roberts-Gaal X and Chan A (2025) Incident Analysis for AI Agents. Proc. AI, Ethics, and Society (AIES) DOI: 10.48550/arxiv.2508.14231.

Fleischmann KR and Wallace WA (2005) A covenant with transparency. Communications of the ACM 48(5): 93–97.

Hayes P, Poel I van de and Steen M (2023) Moral transparency of and concerning algorithmic tools. AI and Ethics 3(2): 585–600

Marin, L. G. U.-B., Rijsbosch, B., Spanakis, G. & Kollnig, K. Are Companies Taking AI Risks Seriously? A Systematic Analysis of Companies’ AI Risk Disclosures in SEC 10-K forms. arXiv (2025). https://arxiv.org/abs/2508.19313

Meijer A, Bovens M and Schillemans T (2014) Transparency. The Oxford Handbook of Public Accountability. Oxford University Press.

Reuel A, Ghosh A, Chim J, et al. (2025) Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations. arXiv. https://arxiv.org/abs/2511.05613

Turilli, M., Floridi, L.: The ethics of information transparency. Ethics Inform. Technol. 11, 105–112 (2009)

Bekkum M van and Borgesius FZ (2021) Digital welfare fraud detection and the Dutch SyRI judgment. European Journal of Social Security 23(4): 323–340.

Wan A, Klyman K, Kapoor S, et al. (2025) The 2025 Foundation Model Transparency Index. arXiv. DOI: 10.48550/arxiv.2512.10169.

Gaps in First-Party and Third-Party AI Model Evaluations

Nick Diakopoulos — Tue, 02 Dec 2025 15:47:28 GMT

A group of researchers with the EvalEval Coalition recently published a new paper on arXiv: “Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations” where they present an analysis of evaluations of AI models with respect to social impacts. The analysis exposes gaps between evaluations run by model developers themselves versus third party evaluations, highlighting a need for transparency reporting standards and regulations.

The crux of the analysis is in comparing 186 first-party reports that were part of model releases by model developers to 183 post-release evaluations that were run by various third parties. These reports were assessed based on the level of detail provided in evaluations of any of seven social impact dimensions as identified by Solaiman et al (2023). The seven dimensions assessed were Bias and Harm, Sensitive Content (e.g. outputting hate speech), Performance Disparity (e.g. unequal results across subpopulations), Environmental Costs and Emissions, Privacy and Data, Financial Costs, and Moderation Labor (e.g. working conditions of data annotators). The rating scale ranged from a 0 (no evaluation present), 1 (vague mention), 2 (concrete results but limited clarity on methods and context), and 3 (sufficient detail to understand and contextualize the evaluation). All the ratings are available here.

The main take-away is that third party evaluations were considerably more detailed, on average, than first-party evaluations (2.62 vs. 0.72 on the 0-3 scale). The implication is that the tech companies and other organizations training models are not releasing as much detail about their evaluations of social impacts in comparison to third parties who run evaluations. The authors note that the most popular models from the US (and to a lesser extent China) tend to attract the most third party evaluations, exposing a gap in evaluation of less-popular models. They also note that certain impact types such as data and content moderation impacts (as well as some others like environmental impacts) are not prevalent at all and are almost entirely absent from third-party evaluations, exposing the reality that third-parties just do not have access to the information they would need to properly evaluate certain issues.

The take-aways for policy here seem pretty clear. First-party evaluations of models by model providers are insufficient when it comes to evaluations of social impacts. There is a fair bit of variance in what level of attention different models receive and what dimensions of social impact are evaluated at all. Transparency standards are needed to provide more consistency and expectations for what evaluations need to be run and how, or which data needs to be disclosed so that third parties can cover more terrain with their evaluations. In addition, there need to be standards around which models demand a full evaluation. And there needs to be sufficient capacity in the evaluation landscape of third parties to be comprehensive. Advancing consistent transparency standards for AI models would support AI accountability by providing the information needed by different accountability forums.

References

Reuel A, Ghosh A, Chim J, et al. (2025) Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations. arXiv. https://arxiv.org/abs/2511.05613

Solaiman I, Talat Z, Agnew W, et al. (2023) Evaluating the Social Impact of Generative AI Systems in Systems and Society. arXiv. https://arxiv.org/abs/2306.05949v2

Subscribe now

AI Ethics Principles and Accountability

Nick Diakopoulos — Tue, 25 Nov 2025 14:45:56 GMT

Establishing norms and behavioral standards for AI systems is central to the AI Accountability Problem. Over the years, private companies, government agencies, non-profits, and other organizations have put forth a number of AI ethics principles to serve this purpose. A principle acts as a behavioral guideline—essentially a value defining what is “good” or “desirable” (van de Poel, 2020). In assessing AI behavior, such principles help define what is (in)appropriate and thus what behavior might call for accountability, either retrospectively for some observed AI failure, or prospectively towards preventing undesirable outcomes.

Early analyses of the numerous published AI guidelines have identified a few core principles. These include privacy, fairness/justice, accountability/responsibility/explicability, transparency, beneficence, non-malfeasance/safety, and human autonomy (Jobin et al, 2019; Hagendorff 2020; Floridi et al, 2018). Despite mostly stable underlying ideas the exact terminology can vary and leads to a lack of clarity (Morley et al, 2021). A longer tail of principles includes ideas like trust, sustainability, dignity, and solidarity (Jobin et al, 2019).

Principles can come from different sources and so be biased in different ways, such as towards ideas in dominant geographies or from power holders such as experts or companies (Hickok, 2021). They can come from researchers and experts in the field (Floridi, 2018), from professional codes of conduct in domains of practice (Diakopoulos et al, 2024), from broad consensus documents like the UN declaration of human rights (Latonero, 2018), and be further informed from public evaluations (Kieslich et al, 2024). What’s the most legitimate source of principles for AI accountability? While a treaty like the Framework Convention on Artificial Intelligence has reached broad consensus, large swaths of the world still haven’t signed on. Achieving truly global principles will require ongoing political work.

Besides their potential to reflect biases, AI principles are also hard to actually implement in practice. Big abstractions need to be translated into concrete operationalizations (Hagendorff, 2020) if they are going to be used to measure AI system failures or guide AI system design to support prevention. Moreover, abstractions like fairness can hide contested ideas with conflicting perspectives (Mittelstadt, 2019) underlining the need to consider context-specific tradeoffs.

Prem (2023) analyzed more than 100 approaches from the literature for bridging the gap between principles and implementation. These include things like AI ethics criteria/checklists, metrics, process models, codes of practice, etc. He distinguishes approaches used during the design of a system (ex-ante), and those that are applied to an AI system after development or perhaps iteratively during development (ex-post). Ex-ante methods are relevant to prospective accountability, whereas ex-post methods are geared towards retrospective accountability (and also prospective if used iteratively during development). He notes that “Generally, there is a strong focus on those aspects for which technical solutions can be built,” exposing a further bias in the research on this topic.

Whereas designers and developers can adopt approaches to help prevent negative outcomes, AI system behavior itself should also be measured to assess adherence to principles. The idea of Ethics Based Auditing (EBA) applies the logic of auditing to the challenge of assessing system behavior “for consistency with relevant principles or norms.” (Mökander et al, 2021). This starts to get at a core issue of operationalizing principles into metrics that can evaluate (mis)alignment with a value. Principles just set the direction; effective accountability requires quantifiable performance metrics. This in turn requires supporting data access to inform those measurements.

Rismani and colleagues (2025) reviewed hundreds of these measures in the literature as they relate to different system components, hazards, harms, and principles. 90% of the measures they found were related to just four principles: fairness, transparency, privacy, and trust. To be useful for accountability metrics need to define some threshold of the metric which indicates the principle has been violated, that the system may create a hazard, and therefore warrants a call for accountability. Thresholds may be context-dependent, vary based on domain, and are subject to the risk tolerance of different stakeholders, but are rarely discussed in the literature (Rismani et al, 2025). This returns us to the normative question: How do you define an acceptable vs. unacceptable level of a measure of a principle? At what level might reasonable people agree there should be accountability? Public perceptions of acceptability may play a role here.

Principles serve as orienting ideas for what is valued. They can be used to determine what constitutes inappropriate behavior, necessitating accountability either retrospectively (blame for failure) or prospectively (prevention of harm). Bringing them into formal accountability forums (e.g. administrative, legal) hinges on mitigating biases in their enumeration and reaching a high degree of consensus. But implementing them in practice remains a challenge. They need to be translated into practices that designers and developers can use to mitigate the hazards created by an AI system, or to metrics with clear thresholds that can measure AI system behavior for signs of deviation. Policy should support the development of context- and domain-specific operationalizations of metrics and thresholds that are indicative of violations of principles by AI systems, as well as the data access provisions that would enable those measurements by the relevant accountability forums.

References

Diakopoulos N, Trattner C, Jannach D, et al. (2024) Leveraging Professional Ethics for Responsible AI. Communications of the ACM.

Floridi L, Cowls J, Beltrametti M, et al. (2018) AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations. Minds and Machines 28(4): 689–707.

Hagendorff T (2020) The Ethics of AI Ethics: An Evaluation of Guidelines. Minds and Machines 30(1): 99–120.

Hickok M (2021) Lessons learned from AI ethics principles for future actions. AI and Ethics 1(1): 41–47.

Jobin A, Ienca M and Vayena E (2019) The global landscape of AI ethics guidelines. Nature Machine Intelligence 1(9): 389–399

Kieslich K, Helberger N and Diakopoulos N (2024) My Future with My Chatbot: A Scenario-Driven, User-Centric Approach to Anticipating AI Impacts. Conference on Fairness, Accountability, and Transparency: 2071–2085.

Latonero M (2018) Governing Artificial Intelligence: Upholding Human Rights & Dignity. Data & Society. https://datasociety.net/library/governing-artificial-intelligence/

Mittelstadt B (2019) Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1(11): 501–507.

Morley J, Kinsey L, Elhalal A, et al. (2021) Operationalising AI ethics: barriers, enablers and next steps. AI & SOCIETY 38(1): 411–423.

Mökander J, Morley J, Taddeo M, et al. (2021) Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. Science and Engineering Ethics 27(4): 44.

Poel I van de (2020) Embedding Values in Artificial Intelligence (AI) Systems. Minds and Machines 30(3): 385–409.

Prem E (2023) From ethical AI frameworks to tools: a review of approaches. AI and Ethics 3(3): 699–716.

Rismani S, Shelby R, Davis L, et al. (2025) Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(3): 2199–2213.

Informing AI Accountability with Public Perceptions

Nick Diakopoulos — Wed, 05 Nov 2025 13:00:33 GMT

One important way to understand standards and expectations for AI system use and behavior is to ask the public. This is critical especially for calls for accountability in social or media forums since they are most exposed to a plurality of opinions about appropriateness or acceptability of behavior. In a democratic system we should also expect that standards for legal, political, and administrative forums be institutionalized downstream of public perspectives. Public perception of AI acceptance is a valuable input for policymakers to help prioritize areas for intervention, and shape the formalization of expectations.

A growing number of surveys consider the public perception and acceptance of AI across different use cases, such as health care, surveillance, and automation (Eom et al, 2024), personal health and labor replacement (Mun et al, 2025), AI in tax fraud detection (Kieslich et al, 2022), media, health, and justice domains (Araujo et al, 2020) and others. One study showed that overall judgments of the value of AI across a wide range of use cases is strongly shaped by perceived benefits, with perceived risks also playing a significant role (Brauner et al, 2025).

A recurring result in many of these survey studies is that there is variance in user acceptance of AI across people of different backgrounds. Factors such as the knowledge, literacy, education, or even political orientation of respondents, as well as their age and gender can play a role in the perception of risk, benefit, and acceptance of AI. For instance, younger respondents often view AI as less risky and more beneficial than older respondents (Brauner at al, 2025). A critical factor in individual perception is the level of AI knowledge the person has (and their confidence in that knowledge), where higher knowledge can lead to lower risk assessment, i.e. “risk blindness” (Said et al, 2023). Because of these differences, policy should ideally be informed by representative population samples, or perhaps population samples weighted by those who might bear the greater risk.

Kieslich et al (2021) take the perspective that we also need to understand public perception of the principles underlying AI systems. This in effect is a measure of whether the system is “aligned” with the perspectives and values of the person evaluating it. They measure perceptions of principles like explainability, fairness, security, accountability, accuracy, privacy, and limited machine autonomy for a scenario related to use of AI in tax fraud detection. For their representative sample of respondents from Germany they find that accountability was perceived as the most important principle. This underscores the idea that accountability is a critical property of AI systems that the public cares about.

Mun et al (2025) pairs a quantitative survey of various AI use cases together with open-ended follow-up questions where respondents elaborate on why they think a use case should or shouldn’t be developed, and what would need to change for them to switch their opinion of the use case. As with Brauner et al (2025) they find that cost-benefit reasoning dominates, but that in some cases virtue-based reasoning is somewhat more prevalent, such as for the Elementary School Teacher or Digital Medical Advice scenarios. They further analyze these rationale through the lens of Moral Foundations Theory and find that Care (i.e. dislike of pain of others or feelings of empathy and compassion) was the most prevalent reason mentioned overall, but fairness also dominated some use cases (e.g. Lawyer). This finding about how a moral foundation or value towards something like care aligns with one of the surveys reported on by Eom et al (2024) where 64% of respondents thought it was a bad idea for “robotic nurses for bedridden patients that can diagnose situations and decide when to administer medicine.” In other words, use cases where care is an underlying moral proposition seem to make people less accepting of the use of AI. In terms of accountability, then, we need to consider not only perceived risk, but also whether there is some kind of underlying value in society that is being violated.

One of the gaps identified by Araujo et al (2020) is that public perception of AI acceptance in a use case doesn’t necessarily tell us if people would personally accept a specific AI decision, or reject it and instead call for accountability. Important work remains to be done to understand this ego-centric retrospective case. On the other hand, for prospective accountability, research has begun to explore public perceptions around which stakeholders should be responsible for taking action to prevent negative outcomes (Barnett et al, 2025). This research uses written scenarios depicting harm from AI in the media ecosystem as a basis for a survey to gather public input about which stakeholders are in a position to take action to prevent the harm. Participants assigned responsibility to any of 12 different stakeholders that emerged from the data, including government, tech companies, news publishers, schools, social media platforms, independent third parties, local communities, public health officials, media companies, NGOs, employers, and unions. Specific actions that these stakeholders could take were then rated in terms of whether they should be taken, and also whether the action should be prioritized, resulting in rich data that could inform policy on how to assign responsibility for prevention, though ideally this process would be re-run with a representative sample.

Public opinion plays a critical role in shaping legitimate norms and standards for AI behavior. Policymakers should recognize that expectations of AI systems — including what is considered “acceptable” — are rooted in social perceptions. Surveys show that these perceptions vary based on demographic or other individual factors such as knowledge, and that there is variance across use case contexts. Policy should therefore be grounded in representative and inclusive data that is tailored to the specific use case contexts to be governed. Although cost-benefit reasoning dominates rationale for AI acceptance, value-based reasoning also needs to be considered. Finally, there is still much open research to do by drilling further into perceptions of who is responsible for what across a variety of situations.

References

Araujo T, Helberger N, Kruikemeier S, et al. (2020) In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI & SOCIETY 35(3): 611–623.

Barnett J, Kieslich K, Helberger N, et al. (2025) Envisioning Stakeholder-Action Pairs to Mitigate Negative Impacts of AI: A Participatory Approach to Inform Policy Making. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency: 1424–1449.

Brauner P, Glawe F, Liehner GL, et al. (2025) Mapping public perception of artificial intelligence: Expectations, risk–benefit tradeoffs, and value as determinants for societal acceptance. Technological Forecasting and Social Change 220: 124304.

Eom D, Newman T, Brossard D, et al. (2024) Societal guardrails for AI? Perspectives on what we know about public opinion on artificial intelligence. Science and Public Policy 51(5): 1004–1013.

Kieslich K, Keller B and Starke C (2022) Artificial intelligence ethics by design. Evaluating public perception on the importance of ethical design principles of artificial intelligence. Big Data & Society 9(1): 20539517221092956.

Mun J, Yeong WBA, Deng WH, et al. (2025) Why (Not) Use AI? Analyzing People’s Reasoning and Conditions for AI Acceptability. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(2): 1771–1784.

Said N, Potinteu AE, Brich I, et al. (2023) An artificial intelligence perspective: How knowledge and confidence shape risk and benefit perception. Computers in Human Behavior 149: 107855.

Transparency Needs for AI Agent Accountability

Nick Diakopoulos — Tue, 28 Oct 2025 10:02:36 GMT

AI incident monitoring databases like the OECD AI Incident Monitor and the AI Incident Database track cases where AI has created harm. But by sourcing incidents from public news articles they’re limited in the detail they include. So there’s a lot of information lacking when it comes to trying to understand and hold complex agentic AI systems accountable. A new paper entitled “Incident Analysis for AI Agents” published at AIES this year tries to tackle this problem by outlining a transparency framework for what pieces of information should be collected about AI agent incidents (Ezell et al, 2025).

The paper outlines three factors that contribute to AI agent incidents: (1) system factors, (2) contextual factors, and (3) cognitive errors. System factors are things like training and feedback data, learning methods, the system prompt, and scaffolding software around the agent. Contextual factors include aspects of the task definition, tools that the agent uses, and information the agent uses or needs to perform tasks. Finally, “cognitive” errors are basically flaws in how the AI agent functions leading to failure, which result from faulty observation of the environment, understanding of inputs, decision-making, and action execution to achieve a goal.

Based on these classes of factors the authors go on to outline a range of information that would be helpful to disclose as part of an AI agent incident. They organize this information into three categories: (1) activity logs, (2) system documentation and access, and (3) tool-related information. Activity logs would include a record of all inputs and outputs to the agent including system and user prompts, external information included in inputs, model reasoning traces, model outputs and actions taken, and necessary metadata like timestamps to contextualize all of this. System documentation and access refers to information about the AI model such as any model or system cards, version information (and change logs), and other parameters (e.g. temperature, random seeds) that might inform an incident reconstruction. Tool information is there to document any tools that agents use including identifying them, their version, the actions the tool enables, and any information about how the tool might adapt to the user.

This paper goes a long way toward outlining the necessary information that should be included in an incident report. But from a policy perspective there are some open questions about incident reporting. For one, how long should a developer maintain an activity log? This might depend on the risk profile of the use case, as well as whether there are any privacy considerations and how those might be handled. Another key question is who gets access to an incident report including any activity logs as well as system and tool-related information? The severity of the incident may create different tiers of access. Administrative and judicial forums might need access to the detailed information outlined in this paper for root-cause analysis and for assessing accountability, but it’s unclear that it should be made fully public due to privacy or trade secrecy issues. Still, secure infrastructure and access control will be needed and policy should consider how to create a shared and standardized infrastructure that AI developers can report into.

There are a few issues that the authors don’t address but which I also think will be important to policy. Related to the access control dimension, a common critique of providing transparency information is that it can enable gaming and manipulation (Diakopoulos, 2020). The many information factors that the authors outline need to be stress tested against how an adversary might be able to manipulate the agent if they were made public. This can also inform which pieces of information need to be withheld for specific closed-door forums, like administrative agencies or judicial cleanrooms. Another open question relates to AI agents using tools that use other tools. If tool use is implicated in an incident, then presumably we would want to recursively evaluate all the tools it may have in turn relied on. This then creates additional monitoring and activity logging demands on tools that are made available to agents. Finally, from a sociotechnical standpoint I think there could be aspects of AI agent transparency that disclose more about the human context around an incident, such as the roles and activities of supervisors, users, or other humans in the loop that may have had access or authority over intermediate results for the agent.

References

Diakopoulos N (2020) Transparency. Oxford Handbook of Ethics and AI. Eds. Markus Dubber, Frank Pasquale, Sunit Das.

Ezell C, Roberts-Gaal X and Chan A (2025) Incident Analysis for AI Agents. Proc. AI, Ethics, and Society (AIES) DOI: 10.48550/arxiv.2508.14231.

Networked AI Accountability

Nick Diakopoulos — Tue, 14 Oct 2025 04:59:25 GMT

An accountability relationship between an actor and a forum means that the actor has to answer to that forum for some conduct (Bovens, 2007). There are a range of types of forums that might have accountability relationships with AI systems including political (e.g. parliamentary hearings, democratic elections), legal (e.g. courts), administrative (e.g. auditors or inspectors from official agencies), professional (e.g. professional societies, industry working groups), social (e.g. civil society organizations, interest groups), or media (e.g. news media, social media).

Different forums operate in different ways, have different capacities for obtaining information or explanation, and may have different standards of expected behavior or ways to sanction the actor. There are also differences in how their authority is constituted, with legal or administrative authority formally flowing from the state, while professional, social, and media forums gain their authority through other informal social processes. These distinctions correspond to vertical accountability, where a forum formally holds power over the actor often due to a hierarchical relationship between them, and horizontal accountability which is essentially voluntary and where there is no formal obligation to provide an account. Forums can also be public as is the case for political, legal, professional, social, and media forums, while others like administrative forums may be partially public or non-public.

Because of their different capacities to know, act, and sanction, forums often work in concert to hold an actor accountable. In a networked view of accountability the interplay between forums is a necessary feature of how accountability is ultimately rendered (Wieringa, 2020). For instance a forum with informal power and a horizontal relationship to the actor in question (e.g. media) may contribute knowledge that is publicized and which informs a forum with formal power and a vertical relationship (e.g. a relevant governing agency) that can further pursue accountability, if needed in a non-public space that accommodates issues such as trade secrecy or privacy. Different forums respond and react to one another.

Wieringa (2023) provides a detailed description of how networked accountability works, illustrating it with the case of the Dutch welfare fraud system, SyRI (System Risk Indication). Briefly, SyRI was a system implemented by the Dutch government and used by municipalities from 2015-2019 to try to detect potential fraud based on welfare beneficiary data. In 2020 a Dutch court ruled that the law authorizing the creation of SyRI was unlawful because it conflicted with the right to privacy ensured by the European Convention on Human Rights (van Bekkum and Borgesius, 2021). While there had been some administrative forums early in the development leading up to the law which tried to pump the brakes, those forums were ultimately not successful in shaping what became the law before parliament passed it.

How was accountability achieved here? The following figure illustrates many of the various relationships described by Wieringa in the case.

It was ultimately the legal forum that provided the formal accountability and authority to overrule the law authorizing the creation of SyRI. In essence the case was about holding accountable the legislators who delegated authority to create the AI system to risk rate people using private personal information. There were clear limits here though as the legal forum was unable to compel disclosure of detailed information about how the SyRI algorithm actually works, with the government arguing that disclosure of that information could enable fraudsters to game or evade the system (van Bekkum and Borgesius, 2021). As Wieringa (2023) writes, the court indicated that the State “needed to explain how the algorithmic system was designed, tested, applied, and how it operates” but failed to do so. As the court opinion wrote, “[w]ithout insight into the risk indicators and the risk model, or at least without further legal safeguards to compensate for this lack of insight, the SyRI legislations provides insufficient points of reference for the conclusion that by using SyRI the interference with the right to respect for private life is always proportional and therefore necessary…” (Meuwese, 2020). This highlights that even a formal forum such as a courtroom may not be able to bridge knowledge gaps about an AI system, and that insufficient transparency about such systems is a core impediment to accountability.

While the legal forum was able to provide the formal accountability to stop the use of SyRI, both social and media forums also played critical roles in achieving that outcome, and the political forum was further activated in the process as well. Indeed the impetus for the court case originally came from a collection of civil society actors, “The Privacy Coalition”, which in 2016 filed a public records request to find out more about the system (Wieringa, 2023). The critical issue in the public records response was that “crucial information, such as audit reports and PIAs [Privacy Impact Assessments], needed to evaluate the proportionality of the system was withheld”. There simply wasn’t enough information to assess whether the privacy violations at stake might be warranted. In short, the legitimacy of the system couldn’t be established on the basis of the information provided: the state hadn’t provided a sufficient account to the social actor. Unsatisfied with the level of detail provided, The Privacy Coalition then sued the state in 2018, moving into the legal forum.

The lawsuit also stimulated some activity in the political forum, with two ministers of parliament (MPs) filing to make the SyRI system transparent, which was denied by the state. Around this time The Privacy Coalition activated the media forum through a campaign to educate the public about SyRI and shape public attention, opinion, and awareness of the system and the issues it exposed. This had the apparent effect of also stimulating more social actors in the form of citizen demonstrations, which were then covered and amplified by the media further. The media forum also participated by scrutinizing SyRI and developing arguments against it through published editorials and commentaries, and by asking members of parliament or of municipal councils to account for the system.

Accountability is not a clean process. It involves lots of relationships, connections, and back and forth as different forums gain information and trigger or reinforce each other. Forums with informal, horizontal accountability relationships are needed to mobilize information, however at the end of the day there needs to be formal accountability from a forum with the power to change the situation and sanction actors, in this case by overturning a law. That means we need laws that define what AI behavior is permissible (or as in this case, what values like privacy need to be preserved in AI system behavior), and that other forums need to have capabilities to gain knowledge of AI behavior such that they can potentially activate formal accountability in a legal (judicial) forum. To the extent that the state would want to defend or reimplement a system akin to SyRI that system would need to offer more algorithmic transparency to clearly demonstrate how the government interest in efficiency of fraud detection is balanced against relevant fundamental rights.

References

Bekkum M van and Borgesius FZ (2021) Digital welfare fraud detection and the Dutch SyRI judgment. European Journal of Social Security 23(4): 323–340.

Bovens M (2007) Analysing and Assessing Accountability: A Conceptual Framework. European Law Journal 13(4): 447–468.

Meuwese A (2020) Regulating algorithmic decision-making one case at the time: A note on the Dutch “SyRI” judgment. European Review of Digital Administration & Law 1(1).

Wieringa M (2020) What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency: 1–18.

Wieringa M (2023) “Hey SyRI, tell me about algorithmic accountability”: Lessons from a landmark case. Data & Policy 5.

How California’s New AI Law Supports Accountability

Nick Diakopoulos — Wed, 01 Oct 2025 10:02:49 GMT

California—home to some of the largest AI developers in the world—has a new AI law on the books. Known as the “Transparency in Frontier Artificial Intelligence Act” (i.e. Senate Bill 53, or just SB53) the law is an important example of how legislative authority can strengthen the capacity for AI accountability. It provides a series of provisions that call for the release of information that help society know about potentially risky AI behaviors.

The scope of the law is quite narrow, however, as it only applies to “catastrophic risk” and “critical safety incidents” related to “frontier foundation models”. Unlike the wider scope of something like the EU’s AI Act, SB53 is really targeted. A “frontier” model is defined as one that is trained with more than a threshold number of numerical operations. What makes something “catastrophic” according to the law is that more than 50 people are seriously harmed, or damages amount to at least $1B from a single incident. The risks in focus here are hypothetical including AIs assisting with making or releasing chemical, biological, radiological, or nuclear weapons; unsupervised AI’s that engage in conduct that you might recognize as murder, assault, extortion, or theft; or evading control of the developer or user.

Because perhaps none of these risks have ever actually materialized, it’s appropriate to see this law as an implementation of the precautionary principle, the idea that action should be taken to prevent potential harm, even when scientific proof of the risk is incomplete or uncertain. Basically, better safe than sorry. The law is a nice example of creating prospective accountability — assigning responsibilities for preventing outcomes that are in this case still quite uncertain but which we want to minimize the chance of coming about. In this case this puts additional onus on the processes implemented that should prevent such risks from materializing, and for monitoring the system for indicators of such risks.

Transparency can be defined as “the availability of information about an actor allowing other actors to monitor the workings or performance of this actor” [1] and is recognized as a enabler for accountability [2]. We can’t hold accountable what we don’t know about. The law addresses the knowledge dimension of the AI Accountability problem using three mechanisms to increase the supply of information about catastrophic risks from frontier foundation models: (1) transparency reports with a wide range of information on risk prevention processess and assessments, (2) incident reporting of critical safety incidents or risks found from internal uses, and (3) reinforcing whistleblower protections so that insiders with direct experience can raise an alarm.

Reflecting a prospective accountability perspective the law requires frontier model developers to publish on their website a “frontier AI framework” that includes a lot of details meant to create processess that prevent catastrophic risks. The framework needs to include information about standards incorporated, thresholds for assessing catastrophic risk, mitigations taken for those risks, reviews of the adequecy of those mitigations, use of 3rd party assessments, updating the framework, security of model weights, the identification and response to critical safety incidents, internal governance practices to ensure these processes are implemented, and assessment of catastrophic risks from internal use. All of this is meant to establish a sound process for preventing these risks, and creates the conditions for accountability if the developer doesn’t have an adequate process.

Beyond the many bits of information that need to be disclosed in the frontier AI framework, developers also have to post a transparency report that includes a range of information about the model including, importantly, the intended uses and restrictions or conditions on uses of the model. These bits of information are important for accountability because they help define the appropriate behavior for users of the models. This transparency report also has to include additional information about the implementation of the frontier AI framework including specific assessments of catastrophic risk, the extent to which 3rd party evaluators (e.g. red teamers) were involved, and any other steps taken to implement the framework. In other words, the developer has to say how they’re fulfilling the process they outline in their framework.

Developers also have to disclose the results of assessments of catastrophic risks resulting from internal use of its models as well as any critical safety incidents to an administrative office of the state. So not only is the developer accountable to the public by way of posting a lot of information about its framwork and assessments to its website, but it is also accountable to an administrative forum where they report additional assessments and incidents. Interestingly, the incident reporting mechanism will also be open to the public, which is important insofar as one of the risks of concern is loss of control of the model by users.

The last bit here is that the law strengthens whistleblower protections. It basically clarifies that employees at frontier AI model developers who are responsible for “assessing, managing, or addressing risk of critical safety incidents” should be allowed to and not retaliated against if they disclose information to certain actors about whether activities of the frontier developer might pose danger resulting from a catastrophic risk.

The bill attaches clear consequences for failing to meet the obligations it lays out. The Attorney General of California can impose a civil penalty of up to $1,000,000 per violation for failing to publish required documents, making false statements, failing to report incidents, or not complying with its own frontier AI framework. This provides the sanctions that make the accountability relationship meaningful, though this only applies to large developers with more than $500M in annual gross revenue.

SB53 is generally a solid example of how you might go about legislating prospective AI accountability to prevent risks that are uncertain. It has provisions for updating over time that will be important for keeping it relevant as technology advances. Perhaps one of the biggest weaknesses I see is that the threshold of training compute used to define a “frontier model” (10^26 numerical calculations) is not required to be disclosed. At the end of the day it’s up to the model developer to raise their hand and say that this law applies to them and to identify which models it applies to. And by one estimate GPT-5 wouldn’t even fall in the remit of the law. We shall see how and whether the frontier model developers engage.

References

[1] Albert Meijer, “Transparency,” in The Oxford Handbook of Public Accountability, ed. Mark Bovens, Robert E. Goodin, and Thomas Schillemans (Oxford: Oxford University Press, 2014)

[2] Nicholas Diakopoulos, “Transparency,” in The Oxford Handbook of Ethics and AI. Eds. Markus Dubber, Frank Pasquale, Sunit Das. (Oxford: Oxford University Press, 2020)

Thanks for reading AI Accountability Review! This post is public so feel free to share it.

Robots.txt as a Lever for AI Accountability

Nick Diakopoulos — Tue, 23 Sep 2025 10:02:19 GMT

The rise of generative AI has been fueled by a huge appetite for data, with AI developers deploying bots to scrape internet content to train their models. But this data collection often ignores a long-standing internet norm: the robots.txt file. For decades, this standard has been the primary way website owners communicate rules for automated access of their content. Can such a standard for bot behavior also serve as a legal basis for accountability?

A new article in the Computer Law & Security Review [1], argues that the robots.txt standard can operate as more than a polite suggestion. The authors propose that common law principles, specifically in contract and tort law, offer a viable path to hold AI developers accountable for how their bots access content on websites.

In case you’re not familiar, the robots.txt file is a public file that a website owner places on their server to set bounds on web crawlers and scraper bots. It specifies what parts of a website are off limits to different bots, helping to manage server load and control access to private or sensitive parts of the site. Major search engines generally respect it and an increasing number of sites online use it to control access by different AI bots [2], but its effectiveness relies on good faith.

The first argument in the article is that robots.txt actually functions as a contract. A webmaster makes an “offer” for the contract by having the robots.txt file on their site. In essence it conveys, "You may access my site under these specific conditions." An AI operator accepts this offer not with words, but with action. When it sends its bot to access the website's content, that action signifies acceptance of the terms laid out in the robots.txt file. The bot’s continued operation on the site demonstrates a deliberate engagement with the website's conditions. While this contract can be implied, it can be further strengthened by referring to robots.txt in the site's Terms of Use. This argument sets up contract law as a path for accountability of AI bot behavior in accessing websites.

In cases where the website blocks all bot access there can’t technically be a contract because no “offer” was made. In these cases the authors argue that the tort of negligence could be used to create legal accountability of AI bot behavior. The authors propose that AI operators owe a duty of care to website owners. Ignoring a robots.txt file is a breach of that duty because respecting the file is a well-established community norm. And when this breach causes harm—such as reputational damage from an AI model misrepresenting a site's content or consequential economic loss—the AI developer could be found liable for negligence.

For policymakers, this research offers a clear message: robots.txt can be treated as more than an informal guideline for AI behavior. But it still needs to be tested in court. Clarifying its legal standing could be the next step towards accountability in a legal forum. More generally, it’s worth considering whether contracts or civil claims of negligence should be a preferred route for governing and holding accountable AI system behavior.

References

[1] Chang, C.-Y. & He, X. The liabilities of robots.txt. Computer Law Security Review. 58, 106176 (2025). https://arxiv.org/abs/2503.06035

[2] Longpre, S. et al. Consent in Crisis: The Rapid Decline of the AI Data Commons. NeurIPS (2024) doi:10.48550/arxiv.2407.14933.

Disclosure: Some text in this post was adapted based on suggestions from AI.

Thanks for reading AI Accountability Review! This post is public so feel free to share it.

SEC 10-K Disclosures as a Route to Corporate AI Accountability?

Nick Diakopoulos — Tue, 16 Sep 2025 10:01:52 GMT

If society doesn’t know about how AI was used or contributed to some outcome there can be no accountability. This is where transparency can be a useful enabler. Transparency—defined as “the availability of information about an actor allowing other actors to monitor the workings or performance of this actor” [1]—comes in many different shapes and sizes. Here I want to talk about it in terms of corporate disclosures made to the U.S. Securities and Exchange Commission (SEC) in 10-K filings.

A 10-K filing is documentation that public companies need to submit annually to the SEC. It provides a comprehensive overview of the business including operations, financial performance, and any significant risks. In recent years the SEC has become concerned with “AI Washing” around the risks of AI, essentially that businesses might be making false claims by over-hyping the technology or underindexing the risks. This interest has even continued under the new administration. Filings are legally binding, and insufficient disclosures can lead to litigation or other enforcement actions.

These disclosures can act as a set of expectations around corporate perceptions of AI risks. If the public knows the company knows there is a risk then we might expect them to do something to try to mitigate it. It also provides a little ray of light that might help accountability forums, such as the media, ask the company about what it’s doing about the risk.

So, what exactly are companies disclosing about AI risks in these filings? A recent paper on arXiv presented an analysis of more than 30,000 10-K filings from more than 7,000 companies made between 2020 and 2024 [2]. Analysis shows that just about half the companies by 2024 mentioned AI somewhere in their disclosure, which was up from only about 1 in 8 in 2020.

The researchers qualitatively analyzed a sample of 50 companies, including 10 of the top tech companies. In that sample they found a wide range of societal risks from AI being cited, including discrimination, privacy, misinformation, malicious use, interactional harms, and so on. The risks were also framed in particular ways to dodge responsibility: “The top-tech firms often seem to externalise societal AI risks, attributing them to third-party misuse (e.g., faulty datasets or misuse of their models) while rarely acknowledging their own role in developing and deploying systems that may contribute to these risks…”

Oftentimes companies rely on vague or broad boilerplate language when they talk about risks, though there are at times more specific statements. In the paper the researchers quote the disclosure from Cognizant Technology Solutions: “The uncertainty around the safety and security of new and emerging AI applications requires significant investment to test for security, accuracy, bias, and other variables - efforts that can be complex, costly, and potentially impact our profit margins.” That’s the kind of statement that might be useful for accountability purposes.

Perhaps just as interesting are the risks the researchers didn’t observe in the sub-sample, which included environmental harms of AI, socioeconomic displacements, dangerous AI capabilities, multi-agent risks, and information ecosystem pollution. These are the risks that it seems companies haven’t yet recognized are anything they need to worry about. That may also limit accountability proceedings if companies don’t think these are issues they need to address.

There are clear limitations for informing AI accountability from 10-K filings both due to vague language and responsibility shirking. At the same time, this study does show that there can sometimes be bits of useful transparency included in these disclosures. Still, a more effective policy might more clearly indicate the types and specificity of AI risk information that are expected in these kinds of filings.

References

[1] Albert Meijer, “Transparency,” in The Oxford Handbook of Public Accountability, ed. Mark Bovens, Robert E. Goodin, and Thomas Schillemans (Oxford: Oxford University Press, 2014)

[2] Marin, L. G. U.-B., Rijsbosch, B., Spanakis, G. & Kollnig, K. Are Companies Taking AI Risks Seriously? A Systematic Analysis of Companies’ AI Risk Disclosures in SEC 10-K forms. arXiv (2025). https://arxiv.org/abs/2508.19313

The Media as Accountability Forum

Nick Diakopoulos — Tue, 26 Aug 2025 14:02:58 GMT

The news media is one of the forums that can enact accountability on AI systems, though it’s also important to keep in mind the networked view of how the media forum connects to and interacts with others.

Jacobs and Schillemans present a typology for how the media contribute to accountability of public institutions, outlining four distinct functions: spark, forum, amplifier, and trigger (Jacobs and Schillemans, 2019). As a spark, the ordinary activity of news reporting (“just asking questions”) may cause organizations to reconsider their behavior or role in a process. As a forum, the media act as a space where investigations uncover unwanted behaviors leading to critical questions that are posed to the actor for explanation. The media can also amplify the impact of other accountability forums, for example, by bringing more attention to congressional hearings. The last role, trigger, is where the media contributes to enabling other accountability forums by producing relevant information that spurs formal accountability in other forums.

Unlike legal or administrative forums the media is an informal forum and has no real authority to enforce infractions from the actors they address. Media forums wield power by drawing public attention to issues, with the consequences being largely reputational in nature. An actor who fails to provide a satisfactory account of an outcome may appear negligent in the public eye or draw the disapproval of the public for its conduct, negatively impacting its reputation.

While its teeth may not be as sharp as some other forums’ the media still has important contributions to make towards closing information and knowledge gaps around AI systems. Using techniques such as interviews with various stakeholders, examination of leaked documents, public information requests (Fink, 2017), external data-driven audits of system behavior (Diakopoulos, 2015), or large-scale investigation of AI systems (Veerbeek, 2025), media can inject valuable observations about the behavior of AI systems that trigger a call for accountability. Media can also surface information that informs and triggers other forums that do have teeth. For instance, Reuters’ reporting on an internal Meta document detailing chatbot policies led to Senate committee investigations. Other journalistic investigations, such as ProPublica’s look at algorithmic rent-setting in Texas, have eventually led to legal settlements.

Media also play a critical role in establishing or maintaining norms around acceptable behaviors for AI systems in society as well as who may be answerable for explaining violations of behavior. This includes propagating both descriptive norms (i.e. what actors do) and injunctive norms (i.e. what actors ought to do) (Lapinski and Rimal, 2005). Journalists apply a range of values around what kinds of outcomes or behaviors of actors may be normatively detrimental and therefore warrant scrutiny. In their daily decisions around what is newsworthy they have to assess what impacts in society are worthy of broader attention. This is the agenda-setting power of the media. By selecting and framing impacts of AI to report on, media can help establish beliefs or reinforce attitudes, which can eventually develop into social norms or expectations for the behavior of AI systems (Shehata et al, 2021). And of course the media is not homogenous. News outlets on the left vs. the right of the political spectrum prioritize different risks and impacts of AI in society in their coverage (Allaham et al, 2025).

In the course of their reporting journalists may seek accounts to help explain some observed behavior — why did the AI system produce some bad outcome? This activity helps to establish accountability relationships between actors in the system and the media as a forum. To do this journalists parse the complex sociotechnical system and consider which actors might take responsibility. By asking certain actors for explanations (e.g. a tech developer or data annotation provider), journalists audition expectations that the actor may need to answer for some outcome or (in)action. Some actors may not respond to requests for explanations, though by including these gaps in their article (“i.e. XYZ did not respond to requests for comment”), journalists subtly signal an injunctive norm — perhaps the actor should have provided an account. Journalists can also query other stakeholders in the system such as experts who study the system to ask them who they think ought to be responsible for some outcome, thus further contributing to the development of injunctive norms.

Policy Implications

The media’s power to shape the public and political agenda around AI, to investigate and expose problems, and to contribute to the development of social norms makes it a critical forum for enabling AI accountability. Policymakers should consider how to support the media’s role to foster a more accountable AI ecosystem.

For one, policies that support the media’s capacity for producing information about AI system behavior can be augmented. This could include everything from strengthening public records requests laws and whistleblower protections to increased data access provisions for auditing. Investing in more journalists working on the AI accountability beat would also serve to increase the stock of information, which is why it’s encouraging to see programs from the Pulitzer Center and the Tarbell Center focused on exactly that.

But also, policymakers need to be cognizant of how different media and perspectives in society are representing the norms and standards of behavior for AI systems. The agenda setting power of media (including new AI-driven media) influences what the public and, consequently, policymakers consider important. Policy should invest resources in large scale tracking surveys of public attitudes towards a range of AI behaviors. Moreover, a media monitor should be set up to track discourse and assess valuations of AI behavior in news, editorials, and other social media. Survey and tracking results can then inform standards for AI system behavior.

References

Allaham, M., Kieslich, K., Diakopoulos, N. Informing AI Risk Assessment with News Media: Analyzing National and Political Variation in the Coverage of AI Risks. Proceedings of the Conference on AI, Ethics, and Society (AIES). 2025. https://arxiv.org/abs/2507.23718

Diakopoulos, N. Algorithmic Accountability: Journalistic investigation of computational power structures. Digital Journalism 3, 398–415 (2015). https://doi.org/10.1080/21670811.2014.976411

Fink, K. Opening the government’s black boxes: freedom of information and algorithmic accountability. 17, 1–19 (2017). https://doi.org/10.1080/1369118X.2017.1330418

Jacobs, S. & Schillemans, T. Media and public accountability: typology and research agenda. In Media and Governance, Eds. T. Schillmans and J. Pierre. (Polity Press, 2019).

Lapinski, M. K. & Rimal, R. N. An Explication of Social Norms. Communication Theory 15, 127–147 (2005). https://doi.org/10.1111/j.1468-2885.2005.tb00329.x

Shehata, A. et al. Conceptualizing long-term media effects on societal beliefs. Annals of the International Communication Association 45, 1–19 (2021). https://doi.org/10.1080/23808985.2021.1921610

Veerbeek, J. Fighting Fire with Fire: Journalistic Investigations of Artificial Intelligence Using Artificial Intelligence Techniques. Journalism Practice, 1–19 (2025). https://doi.org/10.1080/17512786.2025.2479499

Wieringa, M. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency 1–18 (2020) doi:10.1145/3351095.3372833.

Translating Copyright Law into Standards for Accountable AI Training

Nick Diakopoulos — Tue, 19 Aug 2025 12:01:05 GMT

Setting expectations for behavior—and then assessing against those expectations—is a cornerstone of accountability. A recent paper published at FAccT, Interrogating LLM Design under Copyright Law, argues that for copyright violation behaviors we might be better off focusing on the training standards of LLMs rather than their output violations. This shift in perspective—from outputs to development process—offers a path for establishing technical standards that ensure model training practices meet expectations derived from legal codes.

The underlying problem addressed by the paper is that LLMs can “memorize” content that they’ve been trained on, reproducing portions of their training data verbatim. LLM developers are currently facing dozens of legal cases alleging copyright violations. The paper argues that one of the challenges facing courts is that assessing output copyright violations hinges on showing substantial similarity between the original and output. But substantial similarity is a subjective legal concept that resists algorithmic implementation, meaning that we can’t necessarily expect LLMs to be able to reliably monitor and detect whether their output meets any kind of substantial similarity legal standard. Moreover, because users might prompt a model in adversarial ways to nudge a model towards outputting a response that is a copyright violation, this muddies the water around responsibility for the violation. How much responsibility should the user have?

This paper proposes an alternative focus: instead of debating whether an output looks “too similar,” legal forums might scrutinize whether training decisions substantially increased (or decreased) the risk of memorization. The paper refers to this as a “fair learning doctrine” and the authors argue that “By setting an appropriate standard, the doctrine can incentivize design choices that align with ethical and legal norms.” In essence, this reframing would allow developers to be held accountable if they didn’t implement the standard.

The paper works through a couple of analyses using Pythia, an open-source LLM trained on The Pile [2] to offer a proof-of-concept of such training standards. In one experiment the authors show that upweighting the number of times a document appears in a training dataset doesn’t substantially affect the memorization of that document. This analysis demonstrates a method that developers might use to analyze whether their model is sensitive to this kind of upweighting. In another analysis, the authors simulate what would happen if an entire dataset (like FreeLaw or PubMed Central) were excluded from training. Here they find that overlaps in data density can affect memorization risks—suggesting the relevance of dataset curation choices.

In general, these analyses are indicative, but there needs to be additional research to really flesh out what a development standard for minimizing memorization in LLMs should look like. After sufficient research, a technical standards body such as ISO or IEEE might then formalize it and socialize it. At that stage it could be used as a benchmark for any model developer. The main contribution of the paper is that it starts building a bridge between law and model training, suggesting legally informed development standards that might one day be operationalized and used for the purposes of accountability.

References

[1] Wei, J. T.-Z., Wang, M., Godbole, A., Choi, J. & Jia, R. Interrogating LLM design under copyright law. Proc. 2025 ACM Conf. Fairness, Accountability, Transparency. 3030–3045 (2025) doi:10.1145/3715275.3732193.

[2] Gao, L. et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv (2020) doi:10.48550/arxiv.2101.00027.

Prospective Accountability

Nick Diakopoulos — Thu, 14 Aug 2025 14:02:24 GMT

Discourse on AI accountability is often focused on the idea of accountability as a retrospective activity of blame identification and assignment. It’s reactive and backward-looking in time: How can we find someone to blame for something that already happened and have them explain it? It traces cause and assigns fault after a failure, whether that’s a biased hiring algorithm or an autonomous vehicle crash.

Retrospective accountability is certainly important for achieving justice for impacted individuals. If someone was harmed we ideally want to be able to identify who is to blame and have them explain what happened and face the consequences. But identifying causality in complex networks is no easy task, especially given information gaps around AI system behavior related to access or opacity. It may not always even be possible.

In contrast, prospective accountability is proactive. Instead of asking “Who caused this?” after the fact, we ask “Who is responsible for preventing this?” before deployment. It’s about assigning forward-looking responsibilities to stakeholders in order to take steps to avoid undesirable outcomes (Johnson, 2011). Informed by past events, it looks ahead to devise clear expectations for behavior and performance, so that actors have explicit duties to prevent or mitigate anticipated risks and harms. The quality of the plan for how to act in the future is what an actor might have to answer for and explain in an accountability forum. We may also expect actors to explain how their plans are adaptive and responsive so that plans improve as new methods and technologies for preventing harms become available.

A prospective approach to AI accountability would identify actors in the system and assign them specific responsibilities for avoiding bad outcomes and ensuring good outcomes. A central challenge is then normative: What are the bad outcomes we want to avoid? What are the prescribed plans for preventing those bad outcomes? Are these forward-looking responsibility assignments fair? To address these questions we need reliable methods for anticipatory risk and impact assessment (Kieslich et al, 2024) as well as robust stakeholder maps which detail which actors are in the best position and have the capacity (and resources) to act to reliably see to certain outcomes.

The distinction between retrospective and prospective accountability resembles that between outcome and process accountability (Patil et al, 2014). We can either hold someone accountable for an outcome (i.e. retrospective blame for something that happened), or we can hold them accountable for a standard of how they enacted an outcome (i.e. prospective accountability for their plan to act to achieve some outcome). If ChatGPT produces unsafe outputs that negatively impact users’ health, we can and should hold OpenAI accountable for that bad output, but we can also hold them accountable for the content moderation processes they implement to try to prevent that. Failure to implement an accepted process to protect users might indicate negligence. Some harm event could trigger a call for an explanation of this process which might then feed into prospection on how to improve the plan in the future.

For AI policymakers and analysts, shifting from retrospective to prospective accountability means embedding forward-looking responsibilities (not necessarily only for AI system developers) into enforceable governance frameworks. But both types of accountability rely on addressing many of the same underlying questions, such as: How do we set the standards of preventative plans?; and How do we monitor and know about the implementation of those plans? If we move away from identifying and assigning blame as the goal of policy, perhaps because it’s sometimes impossible given sociotechnical complexity, butts up against jurisdictional issues, or triggers fears of user surveillance, prospective accountability can be a useful alternative.

References

Johnson DG (2011) Software Agents, Anticipatory Ethics, and Accountability. The Growing Gap Between Emerging Technologies and Legal-Ethical Oversight, pp. 61–76.

Kieslich K, Diakopoulos N and Helberger N (2024) Anticipating impacts: using large-scale scenario-writing to explore diverse implications of generative AI in the news environment. AI and Ethics: 1–23.

Patil SV, Vieider F and Tetlock PE (2014) Process versus Outcome Accountability. In: Bovens M, Goodin Robert E, et al. (eds) The Oxford Handbook of Public Accountability.

Could Autonomy Certificates Enable AI Accountability?

Nick Diakopoulos — Mon, 04 Aug 2025 15:03:17 GMT

A new essay published last week by the Knight First Amendment Institute proposes that AI agents should be rated on their level of autonomy by a third-party governing body [1]. The authors argue that such “autonomy certificates” would act as a form of digital documentation that could be useful in risk assessments, the design of safety frameworks, and in engineering. But I also think they could be a beneficial idea for supporting AI accountability.

The autonomy of an agent is defined in the paper as “the extent to which an AI agent is designed to operate without user involvement.” Essentially, it’s how much the agent can do on its own without interacting with a user. The level of autonomy of an agent is an intentional design decision—for instance, engineers may define the tools an agent can use and the scope of its perception of its environment.

Various levels of autonomy are articulated in the paper, ranging from level 1 where the user is an operator that drives much of the decision-making, to level 5 where the user is an observer that has no capacity for involvement in the agent’s decisions or actions. In between are level 2 (user as collaborator), level 3 (user as consultant), and level 4 (user as approver).

Levels of autonomy as outlined in [1].

Autonomy certificates prescribe “the maximum level of autonomy at which an agent can operate given 1) some set of technical specifications that define the agent’s capability (e.g. AI model, prompts, tools), and 2) its operational environment.” As such they essentially define an authorized standard for how much an agent is allowed to do within some context. Providing an expectation for behavior is their main benefit in supporting AI accountability since such standards of behavior need to be established in order to trigger accountability proceedings.

For example, if an agent is rated as level 3, but begins acting at what the certificate standard defines as level 4, this might trigger a call for accountability. This could involve the provider of the AI agent needing to explain to the third party certification body why or how that may have occurred, and with the sanction being that the autonomy certificate may be revoked or reissued at a different level. The autonomy certificate could thus act as a standard for helping to ensure that AI agents only operate at the level of autonomy that they’ve been certified for.

Another dimension of accountability that autonomy certificates would support is in outlining the behaviors of the system that need to be monitored. If an AI agent is scoped as being able to use a certain set of tools autonomously (i.e. without user intervention) then this creates an additional need for logging of that tool use. Likewise, for systems rated at lower levels of autonomy (and higher levels of user involvement), the certificate might indicate the kinds of user behaviors that need to be logged. All this logging could then support explanations as part of accountability proceedings if the AI agent was observed misusing a tool, or a user was found to be approving harmful actions that the AI agent suggested.

The authors suggest that autonomy certificates would be produced through a third-party evaluation process that systematically tests an AI agent to identify the “minimum level of user involvement needed for the agent to exceed a certain accuracy or pass rate threshold” on a given benchmark task. They would also need to be updated as systems are updated, such as when new models are released). As such they would need a fair bit of expert human attention, and thus resources, in order to produce. But the benefits to accountability could be meaningful.

References

[1] Feng, K. J. K., McDonald, D. W. & Zhang, A. X. Levels of Autonomy for AI Agents. Knight First Amendment Institute. July, 2025. https://knightcolumbia.org/content/levels-of-autonomy-for-ai-agents-1

Reflexive Prompt Engineering as a Route to Accountability

Nick Diakopoulos — Tue, 29 Jul 2025 14:02:43 GMT

While much of the focus of AI governance, such as in the EU AI Act, has been on the developers or providers of models, a new research paper published at the Fairness, Accountability, and Transparency Conference argues that at least some responsibility should also be assigned to the deployers/users of general purpose AI systems [1]. A deployer can be defined as an entity that uses an AI system “under its authority”, though the AI Act excludes use for “personal non-professional activity” from this definition [2]

The paper develops the idea that accountability shouldn’t just be tied to the underlying technical development of a system, but that the instructions we give AI via prompting are also an important aspect that shapes how AI systems act in the world. Prompting is a “critical interface between human intent and machine output” and so triggers a moral responsibility to attend to the ethical, legal, and social consequences of choices in prompting.

The proposed framework for responsible prompting is termed “reflexive prompt engineering”, emphasizing a heightened self-awareness users should have in their role in controlling AI systems via the prompts they use. It consists of five components, synthesized through the author’s literature review of academic articles and technical documentation:

Prompt Design This involves systematically creating instructions for the AI. The goal is to move beyond mere functionality and include steps that focus on responsibility, such as using diverse examples to guide the model in few-shot prompts.
System Selection This component emphasizes making strategic choices about which AI model to use based not only on its capabilities but also on its environmental impact, transparency, and data privacy protections.
System Configuration This involves adjusting model parameters, such as "temperature," which controls the balance between predictable and creative outputs. Responsible configuration means choosing settings that align with the use case.
Performance Evaluation This is the systematic assessment of a prompt's effectiveness. The framework calls for evaluation criteria that include fairness, potential biases, and implications for privacy and data protection.
Prompt Management This refers to the documentation and organization of prompts over time, including version control and history. This practice is vital for enabling accountability as prompts can serve as supporting documents in explanations of system performance.

From an accountability perspective the premise of the idea is that if there is a standard “responsible prompting” practice for deployers, we can potentially hold them accountable if harm is caused and they did not adhere to that standard. Basically that some entity would be considered negligent if they didn’t follow the standard of responsible practice. Of course, to have that effect, any such standard would need to be widely accepted and recognized as a reasonably expected practice in industry or amongst informed end-users.

Implementing reflexive prompt engineering guidelines, together with literacy and training, would be a nice way to advance responsible organizational practices. Such guidelines could get implemented as part of broader organizational AI use policies. But to really advance accountability here public policymakers would need to implement rules so that deployers could be held accountable to an accepted standard of practice around prompting, with documentation required to show decision rationale around prompt design, system selection and configuration, evaluation, and management. Policymakers could support this avenue by calling for official industry standards around prompt engineering, and then instituting documentation and transparency requirements for deployers. A forum would be assigned with the authority to monitor the transparency information and interrogate deployers in the event of a trigger indicating the deployer had created some harm.

Ultimately, this research provides policymakers with a valuable blueprint that helps shift the conversation on AI accountability toward a more holistic view that recognizes the pivotal role of the user. The idea is clear: how we interact via prompts with AI systems is a fundamental part of their impact in the world, and so probably ought to have some responsibility assigned to it.

References

[1] Djeffal, C. Reflexive Prompt Engineering: A Framework for Responsible Prompt Engineering and AI Interaction Design. Proc. 2025 ACM Conf. Fairness, Accountability, Transparency. 1757–1768 (2025) doi:10.1145/3715275.3732118.

[2] The AI Act Explorer. https://artificialintelligenceact.eu/article/3/

The Problem of AI Accountability

Nick Diakopoulos — Mon, 21 Jul 2025 14:03:05 GMT

To help set the scope for the AI Accountability Review I want to start us off with a solid definition of AI Accountability and the problems it entails. These can help drive towards potential policy options to address those problems. We’ve got two big ideas intersected here: “AI” and “Accountability”. Let’s dissect what they mean—individually and then together.

Refined over several years, the OECD’s definition of AI System is broad and jargon-heavy, but also precise: “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.” (Explanatory memorandum on the updated OECD definition of an AI system, 2024). Algorithms are formally defined differently than AI, but for the purposes of what accountability means in this context, I view the earlier nomenclature of “algorithmic accountability” (Diakopoulos, 2015) as practically synonymous with “AI accountability” discussed here.

Objectives are the goals of the AI system. They can be explicitly written as rules by people, or they can be implicit in data that encodes examples that the system should emulate, for instance, through machine learning. Inferences are outputs of the AI system created on the basis of inputs. Autonomy is how much a system can act without human involvement. And adaptiveness is the idea that an AI system can evolve after its initial development.

The OECD clarifies that “an AI system’s objective setting and development can always be traced back to a human who originates the AI system development process.” Indeed, it is broadly recognized that AI systems are complex sociotechnical systems that interweave a machine-based component and a range of human actors in their design, development, deployment, and use (Chen and Metcalf, 2024). Human influence is always present even if only indirectly linked to actions an AI system takes. As Novelli and colleagues describe further: “The performance of a sociotechnical system relies on the joint optimization of tools, machinery, infrastructure and technology … on the technical side, and of rules, procedures, metrics, roles, expectations, cultural background, and coordination mechanisms on the social side.” (Novelli et al, 2024).

Accountability has two meanings in the Oxford English dictionary: answerability (i.e. “...liability to...answer for one’s conduct…”), and responsibility. A definition that has gained some traction in the AI literature is from Mark Bovens, who defines it as: “a relationship between an actor and a forum, in which the actor has an obligation to explain and to justify his or her conduct, the forum can pose questions and pass judgment, and the actor may face consequences.” (Bovens, 2007). This emphasizes that accountability is relational — it exists between an actor and a forum. Both the relationship and the obligation from actor to forum must be established under some authority in order to facilitate an explanation or justification of behavior. A goal for AI policymakers should be to understand how to configure authority to create such relationships.

Bovens goes on to write that “Accountability is a form of control, but not all forms of control are accountability mechanisms.” Indeed accountability arises as a response to the problem of delegation in principal-agent relationships where a principal delegates some tasks to an agent which acts on the principal’s behalf. Accountability is the mechanism to constrain this delegation relationship by making the agent answerable to a forum for its conduct (sometimes the forum is just the principal itself). In essence the principal delegates a task to an agent and then monitors the execution of that task, or further delegates this monitoring to a forum which can judge the behavior and enact sanctions if needed. The following diagram illustrates how an accountability relationship is established with authority flowing originally from a democratic election process and where the principal delegates both the task and the monitoring of that task.

Accountability is a mechanism for constraining behavior when delegating to agents that can act with some degree of autonomy. It can help mitigate the risks around the loss of agency by the principal, where the agent’s actions don’t fully align with the principal's interests and goals (Koenig, 2025). No wonder it’s such a fundamental concept for governing AI. If people are going to delegate any number of tasks to AI systems—as they’re now doing en masse with generative AI—accountability is a way for people to manage that delegation. Traditionally accountability has applied to individuals or organizations, but now the agent we’re trying to constrain using accountability is a sociotechnical system where the technical component of that system may have varying levels of autonomy or adaptiveness in the world.

The overarching problem of AI accountability is about how to make sociotechnical AI systems answerable for their behavior. Applying the idea of accountability to AI requires we think through some of the basic dimensions of accountability as per Bovens’ definition, namely: (1) agents that are complex sociotechnical systems, (2) forums that need access to observe and interrogate these systems, (3) the capacity for explanation and justification, and (4) behavioral standards that both trigger accountability, and guide judgements and consequences.

The technical (i.e. “machine-based”) component of an AI system raises issues for assigning moral responsibility. Typically, people are morally responsible for a harm if they caused it and intended to cause it (Nissenbaum, 1994). While AI systems can certainly cause harm, they can’t intend to cause it, although the people in the sociotechnical system certainly could. This alludes to a key question: How should accountability work in a distributed system with complex interactions between human and non-human actors? Issues here relate to distributed responsibility (e.g. organizationally internal vs external, across the supply chain, stakeholder mapping, challenges created by open source, assigning and enforcing sanctions, etc.), moral and legal responsibility (e.g. legal personhood of AI, levels of autonomy/agency/intent, legal liability, etc), human issues (e.g. roles such as user or developer, design of technical artifacts, codes of conduct, human-in-the-loop issues, AI influence on human behavior and vice versa, etc). An underlying issue is in how accountability relationships and obligations are even established to begin with, and with what authority.

There’s also the question of How can forums know about AI system behavior? In order to trigger a request for explanation or justification of conduct a forum first needs to know about that conduct. How do forums observe and monitor complex sociotechnical systems to assess their behavior? This gets into issues of observability and data access, transparency and opacity, measurability, auditing, benchmarking, logging and incident reporting, red teaming, public records laws, and so on. Approaches to knowing about AI system behavior will vary for different kinds of forums, such as political, legal, professional, social, or media.

We also need to grapple with the question of How can AI systems explain and justify their behavior? Once a forum knows about AI system behavior, the system must be able to render an explanation to the forum, which may entail reasoning capability or human-AI interaction to help make sense of how the system took inputs to outputs. This must be done interactively such that the forum can also pose questions, such as to interrogate the system or contest its output. One of the underlying challenges here is how to attribute cause in a complex system, sometimes referred to as the “many hands” problem.

Finally we need a good answer to the issue of What standards should be used to judge AI system behavior? Accountability relies on a set of criteria for assessing behavior, and these could come from social norms and expectations which may differ across actors in the complex system, risk and impact assessment approaches, ethical principles, standards bodies, or regulations. And this all needs to be adaptive as technical capabilities and AI behaviors advance. An open problem is how to agree on standards that might apply around the world to AI systems in global use, not only in what might trigger a call for accountability and establish an obligation from an agent to a particular forum, but also around what the consequences or sanctions should be for behavior that falls short of standards.

In summary, establishing AI accountability is an approach for managing the delegation of tasks to increasingly autonomous and adaptive AI systems. It necessitates addressing fundamental questions about agents as complex sociotechnical systems, enabling forums to monitor and interrogate these systems, ensuring the capacity for explanation and justification, and setting clear behavioral standards for judgment and consequences. Policy approaches will need to address these challenges to create the conditions for AI accountability and effectively govern AI in society.

References

Bovens M (2007) Analysing and Assessing Accountability: A Conceptual Framework. European Law Journal 13(4): 447–468.

Chen BJ and Metcalf J (2024) Explainer: A Sociotechnical Approach to AI Policy. Data & Society.

Diakopoulos N (2015) Algorithmic Accountability: Journalistic investigation of computational power structures. Digital Journalism 3(3): 398–415.

Explanatory memorandum on the updated OECD definition of an AI system. (2024). DOI:https://doi.org/10.1787/623da898-en .

Koenig PD (2025) Attitudes toward artificial intelligence: combining three theoretical perspectives on technology acceptance. AI & SOCIETY 40(3): 1333–1345.

Nissenbaum H (1994) Computing and accountability. Communications of the ACM 37(1): 72–80.

Novelli C, Taddeo M and Floridi L (2024) Accountability in artificial intelligence: what it is and how it works. AI & Society 39(4): 1871–1882.