Insynergy
← Back to Insights

The Government Wrote "Judgment" Into Policy. Its Content Is Blank.

Japan is beginning to write “judgment” into the formal vocabulary of AI governance. A reported update to the MIC/METI AI Business Operator Guidelines introduces the requirement for “a mechanism that makes human judgment mandatory” for AI agents and physical AI systems. But policy recognition is not design. What remains undefined is what valid judgment structurally contains: its unit, evidence requirements, responsibility boundaries, and reproducibility conditions. This article argues that the resulting gap—between policy intent (“judgment is required”) and operational reality (“judgment becomes a checkbox”)—is not an ethics problem but an architectural one. It introduces Decision Design as the missing layer: a discipline for designing the structure of judgment, and Decision Boundary as its core concept. The piece concludes with implementable artifacts—Decision Log and Decision Boundary Map—to convert “reviewed” from ritual into traceable, reproducible organizational judgment.

In the meeting minutes of a routine project review, a single line appears:

"AI output checked by team lead. No issues."

This line exists in every organization. It appears as a Slack reply — "LGTM." It surfaces as a toggled checkbox in an approval workflow. It is the final status field in a document management system: "Approved." In every case, the record claims that someone exercised judgment. And in every case, the record says nothing about what that judgment actually was.

What did the reviewer check? Against what criteria? On the basis of what information? If the same AI output were generated tomorrow under the same conditions, would the same person — or any person — reach the same conclusion?

These questions are rarely asked. And while they go unasked, organizations accumulate thousands of judgment claims with no structural foundation beneath them.

Set this line aside. We will return to it.


The Day "Judgment" Became a Policy Term

On February 15, 2026, Nikkei reported that the Japanese government was preparing to update its AI governance guidelines with a specific new requirement: "a mechanism that makes human judgment mandatory" for autonomous AI agents and physical AI systems — the class of AI that operates without waiting for step-by-step human instruction.

The update targets the AI Business Operator Guidelines, jointly maintained by Japan's Ministry of Internal Affairs and Communications (MIC) and the Ministry of Economy, Trade and Industry (METI). The current version, v1.1, was published in March 2025 and covers the responsibilities of AI developers, providers, and users across the system lifecycle. According to a MIC survey conducted in December 2024, awareness of the guidelines among businesses had reached 79 percent. The guidelines are known. The question is what follows from knowing them.

What makes the reported update significant is not a technical detail about AI agents or robotics. It is a linguistic and institutional shift. For the first time, the word "judgment" — not "oversight," not "review," not "approval" — is being written into the formal vocabulary of regulation.

This matters because judgment has historically been treated as a personal attribute. Organizations evaluate whether someone has good judgment. Training programs promise to develop judgment. Performance reviews cite it. But no institutional framework has attempted to define what judgment consists of structurally — what its unit is, what evidence it requires, where its boundaries lie, or how it can be reproduced. "Develop people with strong judgment" is a familiar phrase. "Design the structure of judgment" is not.

The Japanese government's move changes this. By requiring judgment as a systemic mechanism rather than delegating it as a personal competence, the policy marks a transition from aspiration to architecture. The government has declared that judgment must be embedded in systems, not merely expected of individuals.

The declaration itself is a meaningful step. But it is only a declaration. What remains absent — and urgently needed — is the design.


What the Policy Does Not Specify

The policy requires that judgment exist as a mechanism. It does not specify what that mechanism must contain.

This absence has four dimensions.

The unit of judgment is undefined. When a human is asked to "check" an AI output, what exactly are they checking? The factual accuracy of the content? Its fitness for a particular business use? The risk exposure it creates? Its alignment with organizational values? Each of these is a different kind of judgment, requiring different expertise and different information. Yet the act of judgment is treated as singular, as though checking a box could encompass all of them at once.

The evidence requirements are unspecified. An approval status records that someone approved. It does not record what information the approver reviewed, what criteria they applied, what alternative outcomes they considered, or what they chose not to examine. A judgment without a trace cannot be audited. A judgment that cannot be audited cannot be improved.

The responsibility boundary is unclear. When AI proposes and a human approves, who is accountable? The person who clicked "approve"? The developer who built the model? The manager who integrated the AI into the workflow? The organization that selected the vendor? Without an explicit boundary, responsibility either diffuses — no one owns it — or collapses onto whoever occupies the weakest institutional position.

Reproducibility is not addressed. If one team lead checks an AI output and finds no issues, would a different team lead, given the same output and the same context, reach the same conclusion? If judgment depends entirely on tacit knowledge and individual disposition, the organization has no stable judgment quality. It has a collection of individual opinions that happen to be recorded.

These four absences are not oversights in the policy. They reflect a deeper structural gap: the policy recognizes that judgment is necessary but does not define what makes judgment structurally valid. The gap between "judgment is required" and "judgment is designed" is where organizations will succeed or fail.


The Pathology: Judgment Becomes Procedural

The problem is not that judgment is absent from organizations. It is that judgment is present in form but absent in substance.

Most organizations that work with AI outputs have some confirmation process in place. Approval buttons are clicked. Logs record "confirmed." Checklists show every box marked. The procedural surface of judgment is fully intact.

But ask the person who clicked "approve" what, precisely, they approved, and the answer is often uncertain. The approval button has become detached from the cognitive act it is supposed to represent. It has been absorbed into operational routine — something to be done, not something to be thought through. Judgment is claimed but not performed. The ritual persists; the substance has evaporated.

This is not a failure of diligence. Telling people to "be more careful" does not address it, because the problem is not carelessness. The problem is structural. The confirmation act has no defined scope. No one has specified what should be checked, to what depth, against what standard, or with what consequence. The act of judgment has been required without being designed.

When an action is required but its structure is not defined, organizations default to the minimum gesture that satisfies the procedural requirement. The approval button becomes that gesture. Over time, the gesture becomes the norm, and the norm becomes invisible. This is how judgment hollows out — not through negligence, but through missing architecture.


This Is an Architectural Problem

It is tempting to frame the gap between policy intent and operational reality as an ethical problem — a matter of organizational culture, individual responsibility, or moral seriousness. But that framing is both inaccurate and unhelpful.

"Be more thoughtful about your judgments" is advice that cannot scale. It depends on individual motivation, which varies. It depends on workload, which fluctuates. It depends on the presence of consequences, which are inconsistently applied. Most importantly, it treats judgment as an act of will rather than a product of structure.

Systems produce behavior. A system that requires judgment but provides no structure for it will reliably produce ritualized compliance. This is not a prediction; it is an observable pattern in every domain where high-frequency, low-definition confirmation is required — from regulatory sign-offs to medical checklists to financial audits.

The corrective is not exhortation. It is design. The question is not "how do we get people to judge more carefully?" but "what structural conditions must be in place for judgment to function as judgment?"


Three Approaches, Three Gaps

Japan's move occurs within a global context where every major jurisdiction is grappling with AI governance — and each is leaving a different structural gap.

The European Union enacted the AI Act, which entered into force in August 2024. The Act establishes a risk-based classification system with four tiers, from minimal to unacceptable risk. Provisions banning certain AI practices took effect in February 2025, with the bulk of high-risk AI requirements scheduled for August 2026. The EU's approach is obligations-first: it specifies what must be done, documented, and reported. Its strength is regulatory clarity. Its structural gap is that compliance can be satisfied procedurally — the very pathology described above — unless the substance of judgment within those procedures is independently designed.

The United States, under Executive Order 14179 — "Removing Barriers to American Leadership in Artificial Intelligence," signed January 23, 2025, and published in the Federal Register on January 31, 2025 — revoked the prior administration's regulatory approach and oriented policy toward innovation promotion and deregulation. The order's framing prioritizes American competitiveness and infrastructure investment over prescriptive safety requirements. The structural gap here is different: by declining to mandate judgment mechanisms, the US posture leaves judgment entirely to market actors, with no shared vocabulary for what organizational judgment over AI should look like.

Japan's emerging approach sits between these poles. It neither prescribes detailed compliance duties (as the EU does) nor defers to market self-regulation (as the US currently does). Instead, it introduces a requirement — judgment as mechanism — without specifying the mechanism's internal architecture. This creates a distinctive gap: the policy acknowledges that judgment must be systemic, but the system itself remains unbuilt.

The three approaches are not right or wrong in the abstract. Each reflects a different political economy of AI governance. But each, in its own way, creates space for judgment to become procedural, symbolic, or absent. The EU risks compliance theater — organizations satisfying documentation requirements without performing substantive judgment. The US risks judgment absence — organizations deploying AI with no shared standard for what human oversight means in practice. Japan risks structural ambiguity — organizations acknowledging the need for judgment mechanisms while having no design vocabulary to build them.

What none of these approaches provides is a design discipline for judgment itself.


Decision Design: The Missing Layer

This is the gap that Decision Design addresses.

What Decision Design designs: the structure of judgment — its unit, its ownership, its evidence requirements, its reproducibility conditions, and its boundaries. Decision Design does not tell anyone what to decide. It defines the minimum structural conditions under which a decision qualifies as a decision.

What Decision Design is not: it is not a decision-support tool. Organizing data and presenting options improves the materials of judgment; it does not design the structure of judgment. It is not an AI governance framework. Risk assessments and ethical reviews address the external conditions around judgment; they do not specify the internal architecture of the judgment act itself. It is not organizational design. Allocating authority determines who may decide; it does not define what constitutes a structurally valid decision.

What problem Decision Design solves: the condition in which judgment is required but the structure of judgment has not been designed. This condition is the default state of most organizations deploying AI. Policy tells them to judge. No one tells them what judgment must contain.


Decision Boundary: The Core Concept

At the center of Decision Design is Decision Boundary — the explicit specification of where AI autonomy ends and human accountability begins.

The difficulty of judgment in the AI era is not only that decisions are complex. It is that the boundary between what AI handles and what humans must handle is almost never drawn. When a human receives an AI-generated report, they are told to "review" it. But review for what? Factual correctness? Logical consistency? Ethical appropriateness? Fitness for a specific business process? These are distinct types of review, requiring distinct competencies and distinct information. Yet the word "reviewed" compresses all of them into a single, undifferentiated claim.

This is a boundary problem. The boundary between AI responsibility and human responsibility has not been specified, so the human's role is simultaneously everything and nothing.

Human-in-the-Loop (HITL) is the most widely referenced concept in this space. It is also insufficient. HITL requires that a human be present in the process. It does not specify where in the process the human enters, what decision authority they hold, what scope of review they are responsible for, or what evidence they must produce. HITL mandates human presence. It does not mandate judgment structure.

Decision Boundary fills this gap. To design a boundary is to:

Specify the scope of human review at each AI touchpoint — not as a general instruction ("please review"), but as a defined perimeter (review for factual accuracy against source data; do not assess strategic fit). Distinguish between judgments that can be delegated to AI and judgments that must be retained by humans, recognizing that this distinction is not permanent but context-dependent and must be re-evaluated as AI capabilities change. Document the rationale for each boundary placement so that it can be examined, challenged, and revised by others in the organization. Establish a cadence for reviewing whether boundaries remain appropriate as AI capabilities, organizational conditions, and risk environments evolve.

The boundary is not a line drawn once and left in place. It is a design element that requires maintenance, just as any architectural element does. An AI system that was appropriately trusted with L1 factual verification six months ago may now require a different boundary placement if its training data has changed, if the domain has shifted, or if the consequences of error have increased.

When the responsibility boundary is left ambiguous, judgment degrades. Decision-makers become defensive, retreating into formal compliance — checking the box — or they avoid judgment altogether by deferring to the AI output without examination. Either way, judgment ceases to function. The boundary is not a bureaucratic exercise. It is the structural precondition for judgment to operate.


Implementation: The Decision Log

Decision Design becomes operational through artifacts. The first is the Decision Log — a structured record that defines and traces each judgment act.

A Decision Log entry contains the following minimum fields:

Decision ID:         [unique identifier]
Decision Object:     [what is being judged — e.g., "accuracy of AI-generated 
                      compliance summary," not "AI output"]
Classification:      [judgment layer]
                       L1: Factual verification
                       L2: Fitness-for-purpose
                       L3: Risk assessment
                       L4: Value alignment
Decision Owner:      [role, not individual name]
Inputs:              [information reviewed — AI output, reference data, 
                      prior decisions, external standards]
Criteria:            [basis for acceptance/rejection; if no criteria exist, 
                      record that fact]
Outcome:             [approved / returned / held / conditionally approved]
Rationale:           [brief statement of reasoning]
Boundary Tag:        [delegable / human-required / escalate / boundary-review]

The Decision Log is not a post-hoc audit trail. It is a pre-judgment instrument. Before entering the act of judgment, the decision owner confirms what they are judging, at what layer, against what criteria, and within what boundary. The log does not constrain the judgment. It ensures the judgment has structure.

The four-layer classification deserves particular attention. Most organizations treat all AI-related confirmations as a single act. The classification separates them:

A factual verification judgment (L1) asks whether the AI output is accurate against known data. A fitness judgment (L2) asks whether the output meets the requirements of the specific business process. A risk judgment (L3) asks whether using the output introduces acceptable or unacceptable exposure. A value judgment (L4) asks whether the output aligns with organizational principles, strategic direction, or ethical commitments.

A team lead who "checks" an AI output may be performing an L1 judgment, an L3 judgment, or none of them with any clarity. The classification forces the question: which judgment are you actually making?


Implementation: The Decision Boundary Map

The second artifact is the Decision Boundary Map — a process-level view of where judgments occur, what type they are, and where boundaries are placed.

Building a Decision Boundary Map follows a defined sequence. First, enumerate every point in the business process where AI is involved — where it generates, recommends, filters, summarizes, or acts. This step alone is often revealing; many organizations discover AI touchpoints they had not formally recognized. Second, at each point, identify whether a human judgment is required, and if so, what kind. Some touchpoints may require no human judgment at all — the boundary permits full delegation. Others may require L1 verification only. Others may demand L3 or L4 judgment that cannot be delegated under any circumstances. Third, assign a classification layer and a boundary placement to each judgment point. Fourth, examine the resulting map for structural patterns: judgment concentration (too many decisions assigned to one role, creating bottlenecks and fatigue), judgment absence (points where AI acts without any human judgment, creating unmonitored exposure), boundary misalignment (boundaries placed at ceremonial rather than substantive points — the approval button that no one treats as a real decision), and missing escalation paths (judgment points where the decision owner has no defined route for cases that exceed their authority or expertise). Fifth, define the review cadence — how frequently the map is revisited as AI capabilities, process structures, or risk environments change.

The map does not prescribe answers. It makes the judgment architecture visible. And visibility is the precondition for design. Organizations that have never mapped their judgment points will find, almost without exception, that the distribution of judgment across their processes is accidental rather than intentional — a product of historical workflow design rather than deliberate decision architecture.


Return to the Line

Consider again the meeting minutes:

"AI output checked by team lead. No issues."

Under the current state, this line is structurally empty. It contains no Decision ID. It does not specify the decision object — what aspect of the AI output was checked. It carries no classification — whether the check was factual verification, fitness assessment, risk evaluation, or value alignment. It names no criteria. It records no rationale. It places no boundary.

The line claims that judgment occurred. It provides no structural evidence that it did.

Under Decision Design, the same moment would produce something different:

Decision ID:     PRJ-2026-0215-04
Decision Object: Factual accuracy of AI-generated quarterly summary
Classification:  L1 (Factual verification)
Decision Owner:  Project team lead
Inputs:          AI output v3.2, source dataset (Q4 actuals), 
                 prior period summary
Criteria:        All figures consistent with source; 
                 no unsupported claims
Outcome:         Approved
Rationale:       Figures verified against source dataset; 
                 narrative claims cross-referenced
Boundary Tag:    Human-required

This is not more bureaucracy. It is the minimum structure required for the word "checked" to mean something. Without it, "checked" is a procedural gesture. With it, "checked" is a traceable, classifiable, reproducible judgment act.

The government has written "judgment" into the language of policy. That matters. But policy opens a door; it does not furnish the room. The room — the internal architecture of judgment — must be designed.

That design work has not yet begun in most organizations. The tools exist. The concepts are available. What remains is the decision to treat judgment not as a human quality to be hoped for, but as an organizational structure to be built.


References

  1. Ministry of Economy, Trade and Industry (METI) / Ministry of Internal Affairs and Communications (MIC). "AI Business Operator Guidelines (Version 1.1)." March 28, 2025. https://www.meti.go.jp/shingikai/mono_info_service/ai_shakai_jisso/pdf/20250328_1.pdf

  2. Ministry of Internal Affairs and Communications (MIC). "Survey on Dissemination of AI Business Operator Guidelines and AI Governance Initiatives." December 18, 2024. Presented at the 28th AI Governance Study Group (n=80). https://www.soumu.go.jp/main_content/000983033.pdf

  3. European Union. "Regulation (EU) 2024/1689 — Artificial Intelligence Act." Official Journal of the European Union. Entered into force August 1, 2024. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689

  4. Executive Office of the President. "Executive Order 14179 — Removing Barriers to American Leadership in Artificial Intelligence." Federal Register, 90 FR 8741. Signed January 23, 2025; published January 31, 2025. https://www.federalregister.gov/documents/2025/01/31/2025-02172/removing-barriers-to-american-leadership-in-artificial-intelligence

  5. Nikkei. Reporting on Japanese government AI guideline update requiring judgment mechanisms for AI agents and physical AI. February 15, 2026. (Paywalled source; referenced as reported.)


Insynergy Inc. | Decision Design / Decision Boundary

Japanese version is available on note.

Open Japanese version →