The Reverse Turing Test
The Wall Street Journal recently ran a piece about writers going to unusual lengths to prove they had not used AI. They leave in typos. They strip out em dashes. They rough up their prose to make it sound less polished, less symmetrical, more recognizably human. Sounding too clean, too organized, too cleanly cadenced has become a professional liability. So writers now perform humanity — for editors, for clients, for algorithmic detectors that may or may not be reading.
It is, in effect, a reverse Turing test.
The original Turing test asked whether a machine could pass as a human. The new test asks whether a human can prove they are not a machine. And the proof is aesthetic: messy is human, smooth is suspect. The monitored object is no longer the quality of the writing. It is the suspected presence of AI itself.
There is something faintly absurd about all of this, and something more serious underneath. Behind the comedy of professionals deliberately introducing imperfections, there is a real institutional confusion. We have begun policing the wrong variable.
The Wrong Question
"Was AI used?" has become the default question — in journalism, in academia, in hiring, in client relationships. But it is a strange question if you think about it for more than a moment.
Consider what we have always tolerated, even celebrated, in professional work. Ghostwriters write books for executives and politicians, and we accept this. Editors substantially rewrite drafts, and the byline stays with the author. Research assistants gather sources, run preliminary analyses, draft sections. Outsourcing firms produce slide decks, market analyses, code, and creative work for organizations that present these as their own. Law firms have associates write briefs that partners sign. Architecture studios have teams that produce drawings the principal puts their name on.
None of these has been treated as a scandal. None has been considered a violation of authorship. We have understood, implicitly, that creative and professional production has always been distributed.
The most vivid examples come from the creative industries. Studio Ghibli's films are made by hundreds of people — animators, in-betweeners, background painters, colorists, sound designers — yet we speak of "a Miyazaki film" without hesitation. ONE PIECE is drawn with the help of a large team of assistants who handle backgrounds, screen tones, and finishing, yet it remains Eiichiro Oda's work. Detective Conan runs through a similarly large production pipeline. Chibi Maruko-chan did, too. None of these works rests on the myth that one person hand-produced every frame and panel.
What sustains attribution in these cases is not manual production. It is the presence of a coherent judgment owner — someone whose decisions about what the work is, what it says, and what it commits to, run consistently through every element produced by others.
The question, in other words, has never really been who held the pen. The question has been who owned the judgment.
We are now asking the pen question about AI, and missing what we actually needed to ask.
Authorship Is Not Manual Production
This distinction matters because the entire infrastructure of authorship in modern life — copyright, professional liability, scholarly attribution, journalistic accountability, corporate responsibility — rests on the idea that someone, somewhere, owns the final shape of an output.
A work becomes attributable not because every element was manually produced by one person, but because someone owns the final judgment of what the work is. That ownership is what makes it possible to say: this is what I am putting into the world; this is what I stand behind; this is what I will answer for.
When a Ghibli film is praised or criticized, Hayao Miyazaki receives the praise or the criticism — not because he personally drew every cel, but because he answered, throughout production, for what the film would be. When an Oda manga lands a turning point in its narrative, the credit and the scrutiny attach to him, because the judgments that shaped that moment — which panel, which beat, which expression, which line — are his, even if the ink on the page was applied by an assistant's hand.
This is the model we have always used. We have simply not articulated it well, because for most of history the alternatives were limited. You could write something yourself, or you could have a human helper. Now there is a third option, and it is destabilizing not because it is unprecedented but because we have never had to make our implicit theory of authorship explicit.
The question AI forces on us is not new. It is simply newly visible: where exactly does authorship live, if not in the hand that makes the marks?
The answer, as it has always been, is: in the judgment that owns the result.
Human-in-the-Loop as Ritual
The most common response to this problem in AI governance is Human-in-the-Loop — the design principle that a human should remain meaningfully present in any automated decision process. As a slogan, this is sound. As an implementation, it is often vacant.
What does Human-in-the-Loop look like in practice? Frequently, it looks like an approval screen. The system has prepared a recommendation. The human clicks approve, or rejects, or — most commonly — clicks approve while glancing at the summary. Done. Human present. Loop closed. Audit trail satisfied.
The problem is that human presence is not the same as human judgment. A checkbox is not a decision. A click is not deliberation. The presence of a human in the workflow does not, by itself, mean that anyone has exercised meaningful authority over the outcome.
Consider an everyday analog: email mis-send prevention software. Many enterprise email systems now intercept outgoing messages with confirmation screens. Please confirm the recipients. Please confirm the To, Cc, and Bcc fields. Please confirm any attachments. Do you really want to send this? For the first week, users actually look. By the second week, the click has become reflexive. By the second month, the confirmation screen has become part of the act of sending, indistinguishable from it.
This is not user negligence. It is the predictable outcome of repeated low-friction confirmations. The human brain rapidly automates any action that has produced no consequence ninety-nine times in a row. The hundredth time, when there really is a wrong attachment or a wrong recipient, the click has long since stopped being a decision.
The log, of course, still says: confirmed.
Approval persists as an action. Judgment quietly disappears as a process.
This is the failure mode of Human-in-the-Loop when it is treated as a checkpoint rather than as an authority structure. The interface satisfies the requirement that a human was present. It does not satisfy the more important requirement that a human was responsible.
Why This Matters for AI Agents
The Human-in-the-Loop problem has been with us for years in narrow contexts — content moderation queues, fraud screening, automated decisioning. But it is about to scale, because the next generation of AI systems is agentic.
AI agents are increasingly drafting outbound emails, preparing proposals, summarizing legal documents, scoring candidates, pre-screening applications, routing tickets, building approval packages, generating customer responses, even initiating outbound actions on their own. In each of these cases, an organization can wire in a human approval step. And in each of these cases, that approval step will tend, over time and at scale, toward the email-confirmation pattern: present, logged, and substantively empty.
The danger here is not that AI is doing the work. It is that organizations may end up with the appearance of human oversight while authority over outcomes becomes diffuse, ambiguous, and ultimately unowned. The audit trail will be impeccable. Every action will have a human signature attached. And nobody — not really — will have decided anything.
This is the institutional shape of accountability erosion: it does not announce itself. It looks identical to the previous system, until something goes wrong and a regulator, a journalist, or a court asks who, exactly, made the call. At that point, the answer "the system did, and a human approved" turns out to mean very little.
The Business Proposal Example
To make this concrete, consider a sales proposal.
It is now technically straightforward to have AI generate an entire client proposal from a brief — executive summary, problem framing, proposed solution, pricing, timeline, risk register. The output looks credible. The format is clean. The arguments are coherent.
Bring that proposal to a client meeting, however, and a different test arrives almost immediately. The client asks: Why this approach and not the alternative? Why is this risk acceptable and that one not? Why this priority order? What exactly are you committing to deliver?
A proposal that cannot answer these questions in the seller's own voice is not really a proposal. It is a document shaped like one. The value of a proposal does not live in its prose, its layout, or its visual polish. The value lives in the judgments behind it — what to recommend, what to refuse, what to take on, what to put at stake.
This is why "having AI write the whole thing" tends to be hollow even when the output looks excellent. The hollowness is not aesthetic; it is structural. Nobody owns the choices the document represents.
But the inverse is also true, and worth saying clearly. There is nothing hollow about using AI well in the same task. Giving AI rough notes and asking it to structure them. Asking it to build a pros-and-cons table. Asking it to surface missing arguments, sharpen comparison axes, refine language, identify weak transitions. These uses leave the judgment where it belongs — with the human who can answer, in their own words, why this proposal, why this priority, why this commitment.
The boundary is not between AI and no-AI. It is between outputs whose judgments someone can defend and outputs whose judgments nobody can locate.
Government and the Institutionalization of the Problem
This is no longer just a cultural debate among writers and editors. It is becoming an institutional question, and policymakers have begun to notice.
In Japan, the Ministry of Internal Affairs and Communications (MIC) and the Ministry of Economy, Trade and Industry (METI) jointly publish the AI Business Guidelines, currently at Version 1.2. The guidelines address the risks posed by increasingly autonomous AI agents — including the potential for malfunction, privacy infringement, and unintended consequences — and call on developers and deploying organizations to build mechanisms that require human judgment in the operation of such systems.
The direction is right. Requiring that a human remain in the judgment chain is the correct instinct. But the wording matters, and so does what organizations do with it. "Requiring human judgment" can be interpreted as "requiring a human to be present in the workflow," which is the checkbox interpretation and leads back to the ritual problem above. Or it can be interpreted as "requiring that authority over the outcome be explicitly designed and assigned to a specific human role." Only the second interpretation actually solves the problem the guidelines are trying to solve.
Policy can specify the requirement. It cannot, by itself, specify the design. That work has to happen inside organizations.
Why Governance, DX, Automation, and AI Ethics Are Not Enough
The frameworks we currently rely on are necessary but, taken individually, insufficient for this design work.
Governance is necessary, but it operates at the level of policy, oversight, and risk classification. It tells organizations what they must control, not how to design the authority structures through which control is actually exercised.
Digital transformation focuses on process redesign and capability building. It is essential for modernization, but it tends to optimize for throughput and integration rather than for the question of who owns what decisions in the new flow.
Automation focuses on efficiency. Its native question is what can be removed from human handling, not what must remain with humans and why.
AI ethics has matured rapidly and produced important principles — fairness, transparency, accountability, harm avoidance. But principles, however well-articulated, are not authority structures. They tell you what to care about. They do not tell you who decides.
None of these frameworks is wrong. They simply do not, individually or together, fully address the design of judgment authority itself. That design requires its own discipline.
Decision Design is not about improving decisions alone; it is about designing the authority structure within which decisions become institutionally legitimate.
This reframing matters. The interesting question is not whether a particular decision was good or bad — that judgment is always available after the fact. The interesting question is whether the decision was made within an authority structure that could legitimately produce it. A correct decision made by an authority that should not have made it is still an institutional failure. An imperfect decision made within properly designed authority is a routine and recoverable event. Legitimacy is what allows organizations to learn from their decisions rather than merely defend or regret them.
Decision Boundaries
The core construct of Decision Design is the Decision Boundary.
Decision Boundaries are not operational thresholds; they are institutional demarcations of legitimate authority.
The distinction is easy to miss and important to hold. An operational threshold is a parameter: above this value, escalate; below this value, auto-approve. A Decision Boundary is a statement about who has the right to decide what.
Designing Decision Boundaries means answering a series of questions for each meaningful decision a system might produce:
What can AI decide on its own, and under what conditions? What can AI recommend, with the human retaining decisional authority? What must a human decide, with AI providing only structuring or support? When must escalation to a higher authority occur, and what triggers it? When must the process stop entirely and require deliberation outside the workflow? What kinds of decisions, by their nature, cannot be delegated to systems at all — because the act of deciding them is itself the work of the role?
These are not technical questions, although they have technical implications. They are institutional questions about authority — about which roles, in which contexts, hold the legitimate right to bind the organization to particular outcomes.
A well-designed Decision Boundary does several things at once. It makes the locus of authority visible, both inside the organization and to outsiders who might later ask. It clarifies what kinds of errors are recoverable through routine processes and which require deeper intervention. It tells employees, especially those at the human-AI interface, what they are actually responsible for — and, equally important, what they are not. And it allows the organization to articulate, when challenged, why a given output reflects a legitimate exercise of authority rather than an accident of workflow.
The Decision Boundary is, in this sense, the place where authorship lives in an AI-saturated organization. It is the institutional analog of the judgment center that allows a manga to remain attributable to its author even when many hands contribute to the page.
Decision Logs
Boundaries define authority in principle. Logs preserve it in practice.
Decision Logs do not merely record outputs; they preserve accountability continuity across distributed judgment processes.
A system log records that an event occurred. A Decision Log records how authority was exercised when that event was produced. The difference is not technical — it is what the log treats as worth preserving.
A meaningful Decision Log captures: who held the decisional authority for this output; what the AI system contributed, in terms of recommendation, ranking, or generated content; what portion of that contribution the human accepted as given; what the human overrode, and on what basis; what the final decision was; what reasoning supported it; whether any boundary condition was triggered; whether escalation occurred, and if so, to whom and with what outcome.
The function of the Decision Log is not surveillance. It is institutional memory. When an organization is later asked — by a regulator, a customer, a court, or simply its own future leadership — how a particular decision came to be, the Decision Log is what makes a real answer possible. Without it, the organization can only point to system logs that show what was processed, never to records that show what was decided.
Decision Logs also, crucially, provide the empirical basis for revising Decision Boundaries. Override patterns are particularly valuable: every time a human reverses an AI recommendation, the organization learns something specific about where the current boundary is misplaced. Treated systematically, these overrides become the primary data through which authority structures evolve. The boundary is not designed once and frozen. It is maintained, like any institutional structure, through evidence and revision.
The Closing Image
Return, for a moment, to the writer leaving in typos.
Em dashes do not prove humanity. Typos do not prove authorship. Casual phrasing does not prove that someone took responsibility for what was said. The aesthetic markers of human production are decoupled from the institutional fact of human judgment, and the more closely we look at them, the more clearly the decoupling appears.
The reverse Turing test is not really a test of anything that matters. It is a substitute for a test we do not yet know how to administer — a test of whether someone, somewhere in the production chain, owned the judgment that produced this output. That test would not care about em dashes. It would care about whether the person whose name is attached to the work can explain, in their own words, what they took responsibility for when they released it.
This is the question the next decade of AI governance is going to have to learn to ask.
What did you take responsibility for when you released this output?
If the answer is articulate — I chose this framing, I rejected these alternatives, I accepted these risks, I committed to this position — then the use or non-use of AI in the production process is, in the deep sense, irrelevant. The judgment was owned. The work is attributable. The accountability is real.
If the answer is hesitant or absent, then no amount of manual production rescues the output. A document written entirely by hand, but whose choices nobody can defend, has no more institutional standing than one generated entirely by a model.
We do not need to make humans look less like machines. We need to design systems in which humans can meaningfully own judgment.
That is the work in front of organizations now — not the detection of AI, but the architecture of authority around it. Governance, digital transformation, automation, and AI ethics each contribute something necessary. None of them, alone, completes the picture. The discipline that begins to address this gap is Decision Design: the deliberate construction of Decision Boundaries that locate authority, and Decision Logs that preserve it, across processes in which judgment is increasingly distributed among humans and the systems they build.
The em dashes can stay. The typos can go. What matters is whether, behind the prose, someone is willing and able to say: this is mine, and here is why.