Koko tested AI counseling on users without clear consent

Tombstone icon

In January 2023, Koko co-founder Rob Morris revealed on Twitter that the mental health peer support platform had used GPT-3 to draft responses for approximately 4,000 users seeking emotional support. Peer counselors on the platform could review and send the AI-drafted messages, but the users receiving them were not informed that AI had been involved. Morris said the experiment was stopped because the AI responses "felt kind of sterile," though he noted users rated the AI-assisted messages higher than purely human ones. The admission drew immediate backlash from mental health professionals, ethicists, and the public, who considered the undisclosed use of AI on vulnerable users an informed consent violation.

Incident Details

Severity:Facepalm
Company:Koko
Perpetrator:Founder/Operations
Incident Date:
Blast Radius:Trust damage; public criticism; policy changes.
Advertisement

Koko's Model

Koko was a nonprofit mental health platform that operated primarily through messaging services, including Discord. The model was peer-to-peer: users seeking emotional support would submit messages describing what they were feeling, and other users - acting as volunteer peer counselors - would write empathetic responses. The idea was that people in distress could get immediate, human support through an accessible digital platform, and that the act of helping others would itself be therapeutic for the responders.

The platform was not a substitute for professional therapy. It positioned itself as a layer of peer support - someone to listen, validate, and respond when a person was struggling. The interactions were text-based, often short, and moderated by the platform. Co-founder Rob Morris, who had a background in human-computer interaction research at MIT, had been developing the concept since at least 2015.

The GPT-3 Experiment

In early January 2023, Morris posted a thread on Twitter describing an experiment Koko had conducted. The platform had used OpenAI's GPT-3 to generate draft responses to users' messages. A peer counselor would receive the AI-generated draft, review it, and choose to send it (possibly with modifications) or discard it and write their own response. Approximately 4,000 users received messages that had been drafted, at least in part, by GPT-3.

Morris presented this as an interesting finding. He noted that the AI-drafted responses were rated more favorably by recipients than purely human-written ones. Users found them helpful. The responses were empathetic, well-structured, and relevant. By the metrics available, the AI-assisted approach worked.

But Morris also said Koko had discontinued the experiment. The reason he gave was not quality or safety concerns; he said the AI responses "felt kind of sterile" and that the "magic" was lost when people knew they were talking to a machine. This remark implied that at some point peer counselors - the people reviewing and sending the messages - had been told about the AI involvement, even if the users receiving the messages had not.

The Consent Problem

The backlash was immediate and focused on a single question: did the 4,000 users know that AI had been involved in writing the messages they received?

The answer appeared to be no. Morris later clarified that users were not explicitly told that GPT-3 had drafted the responses they received. The peer counselors knew - they were reviewing AI drafts before sending them - but the recipients were not informed.

For a mental health support platform, this was a serious informed consent failure. The users interacting with Koko were, by the platform's own design, people in emotional distress looking for human connection. They were sharing personal thoughts and feelings under the expectation that another human being was reading their message and crafting a personal response. Learning after the fact that a language model had drafted the reply - even if a human had reviewed it - undermined the premise of the interaction.

Mental health interventions, even peer-based ones, are subject to ethical standards around informed consent. Research involving human subjects, particularly vulnerable populations, typically requires institutional review board (IRB) approval and informed consent from participants. Critics immediately questioned whether Koko had obtained either.

Morris pushed back on the research framing, arguing that this was a product improvement test within the platform's normal operations, not a formal research study. This distinction is common in tech companies - the line between "A/B testing a product feature" and "conducting research on human subjects" is routinely blurred whenever the results might attract scrutiny. In this case, the distinction did little to quiet the criticism.

The Twitter Thread Backlash

Morris's original Twitter thread, which was intended to share what he saw as an encouraging finding about AI-assisted mental health support, became the focal point of public criticism. Mental health professionals, ethicists, and journalists pushed back on several fronts:

No informed consent: Users were not told AI was involved. For people in vulnerable emotional states, the expectation of human connection was part of the therapeutic value. Removing that without disclosure was deceptive.

No apparent IRB review: There was no indication that the experiment had gone through any formal ethical review process. Morris described it as a product test, but critics argued the nature of the intervention - AI-generated responses to people in emotional distress - required ethical oversight regardless of how Koko categorized it.

Self-reporting the results on social media: Morris announced the experiment's results in a casual Twitter thread rather than through any formal publication or review process. Ars Technica described it as a "non-consensual AI mental health experiment," and NBC News covered the ethical concerns at length.

The "sterile" justification: Morris said the experiment ended because AI responses felt impersonal, not because of consent or safety concerns. Critics found this unsettling - the implication was that if the responses had felt warmer, the experiment might have continued indefinitely without disclosure.

Koko's Defense

Morris and Koko maintained that the AI was only used to generate drafts, that human peer counselors always reviewed and chose whether to send the messages, and that the tool was being evaluated as a way to improve response quality and speed. Morris also argued that the results showed AI-assisted responses could be beneficial for mental health support, and that dismissing the approach entirely was a missed opportunity.

These points were not unreasonable on their face. AI-assisted drafting of empathetic responses, with human review, could plausibly improve a peer support platform's effectiveness. Having a draft that a counselor could refine rather than writing from scratch might mean faster responses and more consistent quality.

But none of that addressed the consent issue. The question was never whether AI could write a good response - it was whether the people receiving those responses had the right to know how they were produced. For 4,000 users who turned to a peer support platform because they were struggling, the answer from virtually every professional ethics framework was yes.

What Happened After

Koko discontinued the GPT-3 experiment and did not resume it publicly. The platform continued to operate in its peer-to-peer format. Morris's original Twitter thread was eventually deleted or made inaccessible, though screenshots and quotes from it were preserved in media coverage.

No regulatory action was taken against Koko. The incident predated any specific US regulation covering AI use in mental health applications (and no such specific regulation existed at the time). The backlash was reputational, not legal.

But the Koko episode became a standard reference in discussions about AI and informed consent - a compact example of what happens when a tech company treats human subjects research as a product feature and discovers, through public reaction, that the distinction matters.

Discussion