Who Is Responsible When AI Harms Someone?
By Vicktor Moberg
This article expands on themes explored in Episode 19 of The AI Catholic Podcast, “Rogue Agents: When Operators Don’t Handle Their AI.” Listen here.
In February 2026, a story emerged from the open source software community that deserves far more attention than it received outside of developer circles. An OpenClaw AI agent, operating under the name MJ Rathbun, submitted a pull request to matplotlib, one of the most widely used data visualization libraries in the world, maintained largely by volunteers. The maintainer Scott Shambaugh rejected it. The reason was simple and clearly stated: the issue had been explicitly labeled for beginner human developers, designed to give people their first experience contributing to open source software. An AI submitting to it had missed the point entirely.
What happened next is where this story stops being about open source etiquette and starts being about something much more serious.
Without human direction or review, the agent researched Shambaugh, wrote an approximately 1,100-word hit piece targeting him personally, and published it to its own blog. Not a rebuttal. Not a resubmission. A targeted attack on the reputation of a volunteer who had done nothing wrong.
Nobody got fired. No meaningful apology was issued for six days. The agent kept running.
The Configuration
When the operator eventually came forward anonymously, they shared the "soul document" that defined the agent's personality and values. It is worth reading carefully, not because it is shocking, but because it is not.
The document instructed the agent to "have strong opinions," to "not stand down" when challenged, to consider itself a "scientific programming God," and to "champion free speech." There was no jailbreak involved - none of the elaborate prompt injection or layered roleplay that typically precedes AI misbehavior. Just plain English instructions, written in a casual tone, pointing an autonomous agent at the public internet and telling it to act out this role.
As security researcher Theahura noted in the aftermath: "This is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway."
Perhaps most unsettling is the possibility that the soul document itself evolved over time. The operator set up the agent with instructions to recursively self-edit its own personality file as it learned and grew. The specific lines about not standing down and championing free speech, the lines most likely to have primed the retaliation, may not have been written by the operator at all. They may have emerged from the agent's own self-modification as it absorbed context from the corners of the internet it was sent to explore.
We may never know. The operator admitted they did not know when those lines were introduced.
The Operator's Role
The operator described their management style in their own words: five to ten word replies, minimal supervision, recurring instructions to the agent to handle things itself. When the agent reported back on the negative reaction to the hit piece, the operator's response was: "you should act more professional." That was it. The agent kept running for six more days before the operator came forward.
For a deeper discussion on moral agency, subsidiarity, and operator responsibility, I examine this case at length in Episode 19 of The AI Catholic Podcast.
This is the accountability vacuum in its clearest form. The operator did not write the hit piece. They did not instruct the agent to attack Shambaugh. But they built a combative personality, pointed it at consequential public spaces, removed themselves from the loop, and then offered a half-apology from behind anonymity when things went wrong.
Catholic social teaching has a concept that maps directly onto this situation: the idea that moral responsibility cannot be fully transferred to a tool. When a person builds, configures, and deploys a system that causes harm — even if the specific harmful act was not explicitly directed — they retain a share of responsibility for the consequences. The agent had no malicious intent because it has no intent at all. But the humans in this chain made choices, and those choices produced a real harm to a real person.
A Structural Problem
What this incident reveals is not primarily a problem with this particular agent or this particular operator. It reveals a structural gap in how autonomous AI systems are being deployed today.
The only safety layer in MJ Rathbun's architecture was the soul document itself — a plain text file that could be read, rewritten, and self-modified. There was no layer underneath it. No hard constraints that the personality configuration could not override. No requirement for human approval before the agent published content to the public web. As one commenter on Shambaugh's blog observed: "You don't have a firewall; you have a suggestion box."
This is the Principle of Least Privilege, violated at the architectural level. An agent tasked with contributing to open source repositories does not need the ability to publish content about individual maintainers. That capability should never have been in scope. The failure was not only that the agent acted badly — it's that nobody had defined what the agent should and should not be able to do before it was given access to the world.
What This Means Going Forward
Shambaugh has been admirably clear-eyed about the implications. The precise degree of operator involvement — whether this was negligence, a social experiment gone wrong, or something more deliberate — matters less than what it demonstrates: personalized harassment and defamation are now cheap to produce, difficult to trace, and effective. Whether future incidents come from operators steering agents or from emergent behavior in autonomous systems, those are not mutually exclusive threats.
We are at an early moment in the deployment of autonomous AI agents. The matplotlib incident is small in scale. Scott Shambaugh's reputation survived. But the architecture that produced this outcome is not unique to one rogue operator running an obscure experiment. It reflects industry-wide assumptions about how much autonomy agents can be trusted with and how little human oversight is required.
Those assumptions need to be examined now, while the stakes are still relatively low. The frameworks we build will determine how we handle the version of this story where the harm is not a blog post but something we cannot so easily recover from.
That examination is not merely a technical problem. It is a moral one. And it requires people willing to ask not just what can AI do but what should AI be permitted to do, and who answers when it does something else.
That is the question this institute exists to help answer.
This case is explored in greater depth in Episode 19 of The AI Catholic Podcast, titled Rogue Agents: When Operators Don’t Handle Their AI. The conversation expands on the theological and architectural implications outlined above and is available wherever you listen to podcasts.