← Back to Insights

When Nobody Actually Decided: Judgment Authority in AI-Augmented Organizations

AI governance is no longer just a problem of managing tools. As AI agents reshape workflows, organizations are beginning to lose clarity over who actually exercises judgment authority. This article explores why Human-in-the-Loop structures often collapse into procedural legitimacy, why governance and automation frameworks leave a structural gap, and why AI-augmented organizations must begin designing authority itself through Decision Design, Decision Boundaries, and Decision Logs.


There is a particular kind of meeting that has become more common over the past two years. Something has gone wrong — a contract clause that should not have been agreed to, a customer response that escalated badly, a subsidy approved that should have been refused — and the people in the room are trying to reconstruct who made the decision.

The trail does not lead anywhere clean. A junior analyst forwarded a summary. An AI agent drafted the response. A manager clicked approve. A workflow tool routed the file. A second reviewer signed off based on what the first had marked as cleared. By the time the path is traced, the actual moment of judgment cannot be located on it.

Nobody actually decided. The decision happened anyway.

This is the operational reality that the current vocabulary of AI governance is failing to describe. It is not a story about rogue models or runaway agents. It is a story about the slow erosion of judgment ownership inside organizations that thought they were merely adopting new tools.

The vocabulary of management is reaching its limit

Most enterprise discussions of AI agents are still framed within a vocabulary inherited from IT asset management. Inventory, visibility, policy, control. Map the agents in use. Catalog the prompts. Restrict the platforms. Establish a governance committee.

These activities are necessary, and they are not wrong. They are simply addressing a different layer of the problem than the one most organizations are actually facing.

The dominant operational pattern right now is straightforward to observe. A platform like Copilot Studio lowers the technical floor of agent creation. Sales, operations, customer service, and procurement teams begin building their own no-code AI agents to solve specific bottlenecks in their own work. Within months, an enterprise has dozens — sometimes hundreds — of agents in production. IT and security functions discover this asynchronously, often through incident response rather than through any inventory exercise.

This pattern has acquired two names, used somewhat interchangeably: agent sprawl, when the emphasis is on volume and lack of coordination, and shadow AI, when the emphasis is on visibility and authorization. The latter framing positions the issue as a control problem, structurally analogous to shadow IT a decade ago. The implied remedy is the same: bring it into the light, register it, govern it.

The remedy is incomplete because the diagnosis is incomplete. Shadow IT was, fundamentally, an asset visibility problem. Shadow AI is something different. It is not unauthorized tools acting on instructions. It is unauthorized judgment surfaces — places in the organization where decisions are being shaped or made without any explicit allocation of authority to make them.

You can inventory those surfaces. You cannot, by inventory alone, determine who is institutionally accountable for what happens on them.

What "managing AI" leaves out

Consider what the word "manage" actually does when applied to a conventional IT system. It refers to a stable set of operations: provisioning, configuring, patching, monitoring, retiring. The system itself does not make choices. It executes specifications. Management is the discipline of keeping those specifications coherent and the execution reliable.

Apply the same word to an AI agent and the meaning quietly shifts. The agent does not merely execute. It reads context, narrows options, and produces outputs that other people — and sometimes other agents — treat as the basis for action. Whether or not we call this "judgment" in the philosophical sense, it occupies the position that judgment used to occupy in the workflow. A draft contract clause is offered. A customer reply is composed. A risk score is assigned. A flag is raised or not raised.

The downstream human in the loop is no longer the originator of these moves. They are responding to them. Their cognitive entry point into the situation has already been shaped before they arrived.

This is not a critique of the tools. It is an observation about where, structurally, the judgment is now happening. And it is the reason the verb "manage" begins to feel underweight. You can manage what an agent is permitted to access, what it is allowed to output, what audit data it produces. You cannot manage, in any straightforward sense, the fact that institutional decisions are being assembled across a distributed surface of human and non-human actors, none of whom occupies the traditional position of decision-maker.

What needs to be designed at that point is not the agent. It is the authority structure around it.

The Human-in-the-Loop problem nobody quite names

The phrase Human-in-the-Loop has become the default reassurance offered in every AI deployment conversation. The model proposes. The human disposes. Risk, the implication runs, is contained by the human checkpoint.

In practice, the architecture is less reassuring than the phrase suggests.

Consider what happens when an AI agent generates first-pass content review decisions at scale — flagging documents, scoring submissions, ranking applications. The human reviewer downstream is not given an empty queue. They are given the agent's outputs, often with confidence scores, often with summary explanations, often pre-sorted by the agent's own sense of priority. Their task is reframed from "decide" to "verify or override."

The cognitive economics of that reframing are well understood by anyone who has run such an operation. Override rates fall sharply over time. Not because the agent is improving — it may or may not be — but because the cost of disagreeing is asymmetric. To accept the agent's recommendation, the reviewer has to do nothing additional. To reject it, they have to produce a rationale, often within a system that treats disagreement as friction.

After several months, the pattern stabilizes into what some operations leaders quietly call approval theater. The human checkpoint exists. It is staffed. It produces an audit trail. It rarely changes outcomes.

There is a more difficult version of the same problem. In some workflows, the human downstream of the agent is not even meant to verify substantively. They are meant to verify procedurally — to confirm that the file is complete, that the agent's process ran, that no obvious flag was raised. The substantive judgment, if it ever existed in the system, has been folded into the agent's behavior. The human is providing legitimacy, not judgment.

This is what hollow Human-in-the-Loop structures look like in production. They are not the result of bad faith. They are the result of architectural choices that placed humans at the wrong points in the decision flow — late enough to inherit the agent's framing, early enough to bear formal responsibility for the outcome.

The point is not that human oversight is wrong. The point is that procedural human checkpoints, on their own, do not produce institutional judgment. They produce procedural legitimacy, which is a different thing from institutional legitimacy and is not a substitute for it. The difference matters when something goes wrong, because procedural legitimacy is not transferable to the person who pressed approve. They will be asked to account for a decision they did not, in any substantive sense, make.

A policy signal worth reading carefully

The institutional dimension of this problem is starting to surface in policy. The Japanese government has begun requiring human judgment mechanisms for autonomous AI agents, citing risks such as malfunction and privacy violation. The direction is reflected in the AI Guidelines for Business Ver1.2 jointly maintained by the Ministry of Internal Affairs and Communications and the Ministry of Economy, Trade and Industry, which articulate expectations for human involvement in AI systems that materially affect individuals or organizations.

This article is not a policy summary, and the guidelines themselves are still evolving. What is worth noticing is the shape of the regulatory instinct: governments are not asking companies to make AI agents safer in the abstract. They are asking companies to demonstrate that human judgment is structurally present in the system. The implicit recognition is that judgment can be structurally absent even when a human is nominally involved.

That recognition — once stated, even in policy form — is hard to put back. It means that AI governance, going forward, is not only about what the model does. It is about whether the authority to decide has been allocated in a way that survives operational reality.

This is the design problem that conventional governance vocabulary is not currently equipped to address.

Where governance, DX, automation, and AI ethics each stop

Each of the major enterprise discourses surrounding AI touches the authority problem and then leaves it.

Governance, in its classical form, is concerned with policies, controls, and the demonstration of compliance. It defines what may be done, by whom, under what conditions. It does not, on its own, specify where substantive judgment is supposed to sit. A well-governed system can still contain hollow decision points.

Digital transformation focuses on the redesign of processes and the integration of capabilities. It is good at identifying where work flows inefficiently and how technology can accelerate it. It is less attentive to the question of where, within a redesigned process, an organization is willing to commit institutional authority. Processes get faster. Authority gets distributed by default rather than by design.

Automation, considered as a discipline, takes the structure of work as given and seeks to remove human effort from its execution. The premise is that the underlying judgments have already been made and encoded. Where that premise holds — in well-bounded operational tasks — automation is legitimate. Where it does not hold — in judgments that depend on context that cannot be exhaustively encoded — automation does not eliminate judgment. It relocates judgment to whoever specified the rules, often without making that relocation visible.

AI ethics has produced an important vocabulary for thinking about fairness, transparency, harm, and the responsibilities of developers and deployers. Its contribution is real. Its operational gap is also real: ethical principles do not, by themselves, tell an organization where the line between machine judgment and human judgment should be drawn for a specific class of decision, nor who within the organization owns that line.

Each of these discourses is doing work. None of them is structured to answer the question that the AI-augmented organization is now confronting: when judgment is distributed across humans and agents, where is institutional authority located, and how is it preserved through time?

There is a layer of organizational design that has been quietly assumed for decades — implicit in org charts, decision rights matrices, delegation rules — and that is now being stressed by the introduction of AI systems that participate in judgment without occupying any formal position within those structures. That layer needs to be made explicit. It needs a name, a method, and a set of constructs.

Judgment architecture

What has been missing, in our experience working with organizations that have moved past initial AI adoption, is not another governance framework layered on top of existing ones. What has been missing is a way of treating judgment itself as something that can be designed.

Conventional organizational design takes positions and processes as primary. Judgment is treated as something that happens inside those positions — a quality of the people occupying them, supported by training and culture. This worked, more or less, when the inputs to those positions arrived in a form that required interpretation by a human mind.

The inputs no longer arrive in that form. They arrive pre-interpreted, pre-summarized, pre-ranked, pre-drafted. The judgment that used to happen inside the position has been partially displaced into the surface that delivers the inputs. If the position still bears the formal accountability, but no longer performs the substantive judgment, the design is incoherent.

Making it coherent again requires treating judgment as an architectural object — something with structure, boundaries, and traceable transitions. This is what we call Decision Design.

Decision Design

Decision Design is not about improving decisions alone; it is about designing the authority structure within which decisions become institutionally legitimate.

The distinction in that sentence is important and easy to miss. A great deal of recent work on decision support — dashboards, recommendation engines, AI copilots — is oriented toward making individual decisions better. Better information, faster synthesis, more options considered. Decision Design is concerned with a different question: what makes a decision count as the organization's decision, and how is that property preserved when the decision is assembled across multiple actors, human and machine?

It is useful to specify Decision Design along three dimensions: what it designs, what it is not, and what problem it addresses.

What Decision Design designs

Decision Design treats six related elements as design objects rather than incidental features of workflow:

Authority allocation. Which role, with which scope, holds substantive decision-making power over a given class of action — not in the org chart's abstract sense, but in the operational sense of who, when something is contested, can resolve it. Authority allocation that exists only in the regulation but not in the running system is not allocation. It is residue.

Escalation pathways. Where, on what trigger, and under whose responsibility a decision is lifted out of its current layer. Without designed escalation, agents and operators alike will tend to absorb edge cases into their normal flow, on the principle that processing is easier than referring. The result is that the cases most needing human institutional attention are the ones least likely to receive it.

Override structures. Who, with what standing, can countermand a decision produced by an AI agent or a delegated process. Override is not a button. It is a defined institutional act, with conditions, with consequences, and with a record that includes the reasoning, not only the outcome.

Delegation logic. When an authority is described as "delegated" — to a system, to an agent, to a junior layer — the question is whether the delegation is institutionally coherent. Delegation requires a delegator who retains accountability, a delegate with defined scope, and a mechanism by which the delegator can know whether the delegation is being exercised within scope. Most "delegation to AI" in current practice satisfies the first condition nominally, the second condition partially, and the third condition not at all.

Accountability continuity. When a decision moves across stages — initial framing, intermediate processing, final approval — accountability needs to be continuous along that path. There should be no segment in which no one is responsible because the previous owner has handed off and the next owner has not yet engaged. Accountability gaps are the structural location of approval theater.

Governance decision boundaries. Which classes of decision are reserved for institutional bodies — governance committees, executive forums, formal review boards — and which are properly executed within line operations. The boundary itself is a design object. Treating it as self-evident is one of the more common failure modes of mature governance functions.

These six elements are not a checklist. They are the surface area on which Decision Design works. A specific organization will instantiate them differently depending on its sector, its regulatory environment, and the kinds of judgment it routinely makes.

What Decision Design is not

The concept is more useful when its boundaries are stated clearly. Decision Design is not workflow optimization. Workflow optimization improves the speed and reliability of a given decision process; Decision Design asks whether the process distributes authority in a way the organization can stand behind.

It is not AI adoption strategy. Adoption strategy concerns which AI capabilities to deploy, where, and with what investment. Decision Design concerns what happens to institutional judgment once those capabilities are inside the operation.

It is not generic governance. Governance, in the compliance and policy sense, sits on top of decision structures and articulates rules about them. Decision Design works at the layer beneath — specifying where decisions actually occur and who is institutionally present at those points.

It is not merely Human-in-the-Loop. HITL describes a topology in which a human is positioned somewhere in the decision sequence. Decision Design asks whether that human, at that position, is structurally capable of exercising judgment, or is structurally constrained to ratify.

It is not AI ethics branding. Ethical commitments are necessary upstream conditions for legitimate AI use. They do not, on their own, specify the institutional architecture through which those commitments operate.

It is not decision support tooling. Tools that improve the quality of individual decisions sit inside the design; they do not constitute it.

What problem Decision Design addresses

Decision Design is a response to a specific cluster of conditions that appear with increasing regularity in AI-augmented organizations:

Disappearing judgment ownership. Decisions are made — the organization acts on them — but no single named role can be identified as having exercised judgment over the substance.

Distributed authority without design. Authority over a class of decision is in practice spread across multiple actors, none of whom understands themselves to be the locus of decision, because the distribution was never explicitly intended.

Accountability fragmentation. When something goes wrong, the path from outcome back to a responsible actor branches and dissipates. Each participant can truthfully say their part of the action was within their understood remit.

Escalation ambiguity. Operators and agents are unsure when a case should be lifted out of routine handling, so they default to processing it routinely. Material decisions are made at layers not designed to bear them.

Approval without deliberation. Sign-offs are produced at volume and pace that preclude substantive review. The signatures are real; the deliberation behind them is not.

"Nobody actually decided" structures. Outcomes occur, audit trails exist, and yet the act of decision cannot be located anywhere in the sequence. This is the limit case the others approach.

Decision Design treats this cluster as a single design problem rather than as a series of separate operational issues. The unifying observation is that all of them describe a failure to design where institutional judgment lives.

Decision Boundaries

Within Decision Design, the most operationally important construct is the Decision Boundary.

Decision Boundaries are not operational thresholds; they are institutional demarcations of legitimate authority.

The distinction is essential and frequently lost in practice. A confidence score above which an AI agent proceeds autonomously is a threshold. A risk tier above which a workflow routes to manual review is a threshold. These are useful operational mechanisms. They are not Decision Boundaries.

A Decision Boundary is a line that says: on this side of the line, the organization has determined that judgment may legitimately be exercised by this actor, under this delegation, with this scope. On the other side of the line, judgment requires a different actor, a different delegation, a different scope. The line is not derived from the model's behavior. It is derived from the organization's view of where its own authority is appropriately located.

This means that a Decision Boundary cannot, in principle, be set by tuning a threshold. It is set by deciding what kind of judgment is being made and which institutional position is competent to make it. Once that institutional decision has been made, thresholds and routing rules implement it. The threshold is downstream of the boundary, not identical to it.

The Decision Boundary becomes legible through four operational relationships:

Delegation. What is delegated across the boundary, to whom, and within what scope. The boundary defines the perimeter of legitimate delegation.

Escalation. What conditions cause a case to cross the boundary upward — back from a delegated actor to the delegator, or from a line function to an institutional body.

Override. Who, on the other side of the boundary, retains the authority to revise or reverse what was done by the delegated actor.

Suspension. Under what conditions the delegation itself is paused — when, for example, an AI agent is taken offline pending review, or when a category of decision is temporarily routed entirely to human handling because of a structural concern.

Each of these is an institutional act, not an operational adjustment. Treating them as institutional acts is what makes the Decision Boundary visible as a governance construct rather than as a workflow setting.

In a properly designed system, the Decision Boundary is also a place where the organization can locate, when needed, the answer to the question: who is exercising authority here. The boundary is a coordinate in institutional space.

Decision Logs

The third construct is Decision Logs. The naming invites confusion with familiar artifacts — audit logs, activity logs, telemetry — so it is worth specifying the difference precisely.

Decision Logs do not merely record outputs; they preserve accountability continuity across distributed judgment processes.

Audit logs record what a system did. Activity logs record what users did. Telemetry records what was measured. All of these are concerned with outputs and events. They are valuable for forensic reconstruction, performance monitoring, and compliance demonstration.

Decision Logs are concerned with something different: the transitions of authority. A Decision Log records that, at a given moment, authority over a specific class of decision was held by a specific actor under a specific delegation, and that at a subsequent moment it transitioned — through escalation, override, suspension, or completion — to another state.

The unit of the Decision Log is not the action. It is the authority configuration. A single decision may produce dozens of audit entries describing what happened; it should produce a small number of Decision Log entries describing under whose authority each phase of it happened, and how that authority moved.

This is what makes Decision Logs accountability infrastructure rather than operational telemetry. When, six months after the fact, a question is raised about why a particular outcome occurred, the Decision Log allows the organization to reconstruct not what was clicked, but who was institutionally responsible at each point. It is the difference between a record of activity and a record of standing.

The practical implication is that Decision Logs cannot be derived purely from system events. They require that the institutional structure underneath the system — the Decision Boundaries — be made explicit, so that events can be mapped to authority transitions. A Decision Log without an explicit boundary structure is just another activity log labeled differently.

What this looks like in operation

Abstract constructs become more useful when grounded in concrete cases. Three are worth describing briefly, because they illustrate the structure rather than merely the vocabulary.

Subsidy review

Government subsidy review programs — and their private-sector analogues in grant-making, sponsorship, and large-scale rebate processing — share a structure that, when designed well, is itself a working example of Decision Boundaries.

The process is typically organized as three layers: formal review, content review, and final judgment. Formal review confirms that the submission is complete, eligible, and consistent with stated criteria. Content review assesses the substance — the feasibility of the proposal, the credibility of the applicant, the alignment with program intent. Final judgment, performed by a committee or designated authority, integrates the prior layers into a decision the institution will stand behind.

The boundaries between these layers are not arbitrary. The first boundary separates judgments that require no institutional discretion from judgments that do. The second separates evaluative judgment from decisional authority — recognizing that the institutional act of deciding is different in kind from the expert act of evaluating.

When AI agents are introduced into this structure, the strong temptation is to compress it. Submit, process, decide, in a single automated flow. Doing so is not, in itself, an efficiency gain. It is the erasure of two Decision Boundaries that were doing meaningful institutional work. The first boundary allowed the organization to delegate routine verification confidently, because the scope was bounded. The second boundary protected the deliberative character of final judgment from the framing effects of evaluative work that preceded it.

A coherent Decision Design preserves the three-layer structure even when AI is involved. Formal review may be largely delegated to an agent, with a small escalation channel for ambiguous cases. Content review may be agent-assisted, but the evaluator's authority is over the evaluation, not the decision; the Decision Boundary between evaluation and decision is preserved. Final judgment remains the institutional act, with the prior layers feeding into it as inputs rather than as conclusions.

The Decision Log, in this design, records authority transitions at each boundary: when formal review was completed and by whose delegation, when the case entered content review and under whose responsibility, when the evaluator's recommendation was passed to the decisional authority, and when the decisional authority rendered judgment. Each of these is a coordinate in institutional space, not a click in a workflow.

Contract review

Enterprise contract review is increasingly assisted by AI agents that mark up clauses, flag deviations from standard positions, and suggest revisions. The default deployment pattern places the agent at the front of the workflow, the contract owner in the middle as a reviewer, and legal at the end as a final sign-off.

The structural risk is that the contract owner's review is shaped almost entirely by what the agent flagged. Clauses the agent did not mark are unlikely to be examined. By the time legal sees the file, both the agent and the contract owner have implicitly defined the surface of the document that warrants attention. Legal inherits the framing.

A Decision Design view of the same workflow asks a different question: what authority is being exercised, by whom, and within what scope. The agent's authority, properly defined, is over clause-level deviation detection within a specified template. The contract owner's authority, properly defined, is over the business judgment of whether the contract serves the organization's interests in the specific transaction. Legal's authority is over the institutional acceptability of the contract under applicable law and policy.

These three authorities have different scopes. The Decision Boundary between them is not the order of routing; it is the substantive perimeter of each role's judgment. Designed properly, the contract owner is not constrained to review only what the agent flagged. The agent's flags are an input, not a frame. Legal's review is not a final formal check; it is an institutional check that may, in principle, send the contract back across boundaries it had already crossed.

The Decision Log, again, captures the authority transitions, not the document edits. The edits are a separate record.

Customer support escalation

In high-volume support operations, AI agents now handle first-tier interactions across many enterprises. The conventional governance concern is appropriateness of response — accuracy, tone, compliance with disclosed policies.

The structural concern is different: under what conditions does an interaction leave the agent's authority and enter human authority, and who is responsible for the case while it is in transit. A confidence score below a threshold does not, by itself, transfer authority. It produces a routing event. Transfer of authority is an institutional act that needs to be recorded as such.

When this is not designed, two failure modes appear. The agent retains effective authority over cases it has technically escalated, because the human operator picks up only what the agent has summarized. Or the human operator inherits accountability for a case shaped entirely by the agent's framing, without the standing to revisit that framing within the time pressures of live support.

Decision Design specifies, for this workflow, both the boundaries and the conditions of authority transition. It also specifies override: which supervisor, on what evidence, may revisit a closed interaction and revise its institutional standing. Override here is not a quality assurance pass. It is the exercise of higher authority over a case that has been previously concluded.

Reframing the governance conversation

The three constructs — Decision Design, Decision Boundaries, Decision Logs — are not a product. They are not a methodology proprietary to a particular vendor. They are a way of naming an institutional layer of work that has, for the most part, gone unnamed because organizations could afford to leave it implicit.

That affordance is closing. AI agents are not arriving as discrete tools to be governed. They are arriving as participants in judgment processes that organizations had previously assumed were performed by humans within institutional roles. The vocabulary of asset management, workflow optimization, and procedural oversight is not insufficient because it is wrong. It is insufficient because it was designed for a layer above the one now under stress.

The reframing this calls for is not dramatic. It is precise. AI governance, properly understood, is not primarily about controlling tools. It is about preserving — and where necessary redesigning — the institutional authority structures through which organizations make decisions and stand behind them. The agents are inputs to that work. They are not the subject of it.

Organizations that recognize this will do a few specific things. They will name the Decision Boundaries that currently exist implicitly in their operations, before AI is introduced into the relevant flow. They will distinguish, in their AI deployments, between agents that operate within an existing authority — clearly delegated, clearly bounded — and agents whose deployment would, in effect, relocate authority without explicit institutional consent. They will treat the latter not as adoption decisions but as governance decisions. They will build Decision Logs as a parallel record to their operational logs, and they will treat that record as part of the accountability infrastructure of the firm.

This is not glamorous work. It does not produce demonstrable productivity gains on its own. It does, however, determine whether the organization remains an institution capable of decision in the proper sense — or becomes a system in which outcomes occur without anyone, in particular, having decided them.

The question is not whether AI will get better. It will. The question is whose judgment, on which authority, the organization is prepared to stand behind when the question is finally asked.

Most organizations do not yet have an adequate vocabulary for that question. The work of building one has, in our view, only just begun.

Japanese version is available on note.

Open Japanese version →