Robots.txt as a Lever for AI Accountability
Could common law around contracts and negligence provide for legal accountability?
The rise of generative AI has been fueled by a huge appetite for data, with AI developers deploying bots to scrape internet content to train their models. But this data collection often ignores a long-standing internet norm: the robots.txt file. For decades, this standard has been the primary way website owners communicate rules for automated access of their content. Can such a standard for bot behavior also serve as a legal basis for accountability?
A new article in the Computer Law & Security Review [1], argues that the robots.txt standard can operate as more than a polite suggestion. The authors propose that common law principles, specifically in contract and tort law, offer a viable path to hold AI developers accountable for how their bots access content on websites.
In case you’re not familiar, the robots.txt file is a public file that a website owner places on their server to set bounds on web crawlers and scraper bots. It specifies what parts of a website are off limits to different bots, helping to manage server load and control access to private or sensitive parts of the site. Major search engines generally respect it and an increasing number of sites online use it to control access by different AI bots [2], but its effectiveness relies on good faith.
The first argument in the article is that robots.txt actually functions as a contract. A webmaster makes an “offer” for the contract by having the robots.txt file on their site. In essence it conveys, "You may access my site under these specific conditions." An AI operator accepts this offer not with words, but with action. When it sends its bot to access the website's content, that action signifies acceptance of the terms laid out in the robots.txt file. The bot’s continued operation on the site demonstrates a deliberate engagement with the website's conditions. While this contract can be implied, it can be further strengthened by referring to robots.txt in the site's Terms of Use. This argument sets up contract law as a path for accountability of AI bot behavior in accessing websites.
In cases where the website blocks all bot access there can’t technically be a contract because no “offer” was made. In these cases the authors argue that the tort of negligence could be used to create legal accountability of AI bot behavior. The authors propose that AI operators owe a duty of care to website owners. Ignoring a robots.txt file is a breach of that duty because respecting the file is a well-established community norm. And when this breach causes harm—such as reputational damage from an AI model misrepresenting a site's content or consequential economic loss—the AI developer could be found liable for negligence.
For policymakers, this research offers a clear message: robots.txt can be treated as more than an informal guideline for AI behavior. But it still needs to be tested in court. Clarifying its legal standing could be the next step towards accountability in a legal forum. More generally, it’s worth considering whether contracts or civil claims of negligence should be a preferred route for governing and holding accountable AI system behavior.
References
[1] Chang, C.-Y. & He, X. The liabilities of robots.txt. Computer Law Security Review. 58, 106176 (2025). https://arxiv.org/abs/2503.06035
[2] Longpre, S. et al. Consent in Crisis: The Rapid Decline of the AI Data Commons. NeurIPS (2024) doi:10.48550/arxiv.2407.14933.
Disclosure: Some text in this post was adapted based on suggestions from AI.