Insynergy

When governments deploy AI, the question is not merely adoption—it is accountability. As Japan’s Digital Agency and Tokyo Metropolitan Government scale their Government AI platform “Gennai,” a deeper design challenge emerges: who decides, and where does responsibility reside? From formal screening to substantive review to final human decision, public-sector AI reveals a structural tension between speed and judgment. This article explores how “delivery power” reshapes governance—and why the true frontier of AI implementation lies in designing the Decision Boundary (organizational governance). Beyond experimentation, 2026 marks the shift from AI pilots to measurable outcomes. The real question is no longer whether AI works, but how Human Judgment Decision Boundary and Governance Decision Boundary must be architected to sustain trust.

When delivery power changes, society changes — but who decides what gets delivered?

In February 2026, at a joint career meetup hosted by Japan's Digital Agency and GovTech Tokyo, Tokyo Vice-Governor Manabu Miyasaka made a deceptively simple observation. Amazon didn't write books. It made books arrive faster. Uber didn't invent the car ride. Netflix didn't invent film. In each case, the product stayed the same. What changed was the power to deliver — and when that power multiplied tenfold, entire industries were remade.

What made this observation heavy was the context. Miyasaka was not speaking about startups or markets. He was speaking about government.

Imagine applying for a grant in the morning and receiving support by evening. Imagine government proactively pushing notifications — "There's a support program for you" — instead of waiting for citizens to navigate opaque bureaucracies. That, Miyasaka argued, is the real promise of digital transformation in the public sector. Yet the reality remains: thick paper applications, in-person counters, reviews that take six months. Politicians design policy. What's missing is the power to deliver that policy to citizens at speed.

Here is the tension that deserves more attention. As delivery accelerates, so must the speed of judgment. And as the speed of judgment increases, the question of who is accountable becomes dangerously blurry. Japan's national government and Tokyo Metropolitan Government are both racing to deploy AI at scale — but are they designing the boundaries of judgment with the same urgency?

Gennai: A Government AI Platform Built In-House

Shingo Yamaguchi, the official overseeing AI implementation strategy at the Digital Agency, reported on the rollout of "Gennai" (Government AI platform) — an AI infrastructure accessible through the government employee portal.

Gennai currently hosts roughly 230 applications, ranging from general-purpose tools like chat and translation to specialized apps that use retrieval-augmented generation (RAG) against administrative documents. Launched within the Digital Agency in May 2025, Gennai is now budgeted for deployment to over 100,000 government employees by May 2026.

The notable detail is that Gennai is built internally. Approximately 25 engineers, most recruited from the private sector, work alongside product managers and designers. When a new model is released, it is integrated almost immediately. This pace is fundamentally alien to Japan's traditional government procurement culture, where system updates typically require multi-year cycles.

In late 2025, Prime Minister Takaichi issued seven directives on AI, the first of which mandated the expansion of Gennai to over 100,000 government employees. The directive was blunt: senior officials must use it, raise their fluency, and apply AI to their work creatively. As Yamaguchi noted, having top-down commitment at that level is, frankly, fortunate — it lets the team move at speed without the usual bureaucratic friction.

Tokyo's AI Strategy: The Traffic-Light Model

On the Tokyo side, Masataka Tsuji of the Digital Services Bureau laid out the metropolitan government's AI strategy, formally adopted in July 2025 following an AI strategy council that included University of Tokyo professor Yutaka Matsuo.

The strategy's core architecture is a traffic-light classification model. Tokyo's operations are divided into three domains: services where citizens interact directly with AI, services where employees use AI to process citizen-facing work, and purely internal employee workflows. Each domain is then assessed across five levels of AI maturity and color-coded. Green means aggressive adoption. Yellow means proceed with risk awareness. Red means wait.

Internal employee workflows are entirely green — full speed ahead.

Tsuji emphasized a point that matters more than it might appear: "AI adoption must not become the objective itself. It has to remain a means." In fiscal year 2026, AI-related projects across Tokyo's government increased roughly 80% year-over-year, exceeding 200 initiatives, with infrastructure and urban planning accounting for about a quarter.

This sober stance — AI as instrument, not aspiration — separates the Tokyo strategy from private-sector AI announcements, where adoption itself often serves as investor-relations material. For government, AI has no value unless it makes policy delivery faster, fairer, and more accountable.

Agriculture Survey Analytics: Redefining the Human Role

Concrete results are already emerging. Yamaguchi described a collaboration with the Ministry of Agriculture, Forestry and Fisheries (MAFF). Following the rice shortage problem, MAFF conducted a nationwide survey of rice farmers — 30 items, 8,000 responses. Under normal conditions, a single analyst would spend two to three months cross-tabulating the data in Excel.

MAFF requested support from the Digital Agency. The division of labor was clean. MAFF would focus on forming hypotheses: for example, whether young rice farmers in mountainous regions plan to increase or decrease planting area next season. The Digital Agency would take those hypotheses, feed them with survey data into AI, and return validated reports.

The implication, as Yamaguchi put it, is that human work itself changes. Humans concentrate on hypothesis formation and stakeholder communication. Data analysis is offloaded to AI. Spreading this use case would trigger recognition across the bureaucracy: there is a better way to work.

Grant Screening: A Three-Layer Design

The second example came from Tokyo. Grant and subsidy administration is a core function of a metropolitan government, and each application requires painstaking manual review. Tokyo's industrial labor division has begun piloting AI-assisted screening through its generative AI platform.

The structure Tsuji described is worth examining closely. Formal screening — checking whether applications meet eligibility requirements — can be handled by AI with high reliability. But substantive review — evaluating business plan feasibility, regional economic impact, applicant capacity — raises hard questions about where AI authority ends. And final judgment, under the Tokyo AI strategy, must remain with a human decision-maker.

Formal screening. Substantive review. Final human decision. This three-layer structure is not simply an operational workflow. It is an act of boundary design — determining where AI stops and human accountability begins.

Yamaguchi echoed this from the national side: "Formal screening can be delegated to AI. But edge cases will arise. We need to build a world where, in those gray zones, government officials can use AI as a sparring partner while retaining the authority to judge."

Japan's government is also moving to require AI developers to build mechanisms mandating human judgment when autonomous AI agents are deployed. This regulatory impulse and the internal operational design are the same conversation. When government designs AI judgment boundaries within its own operations, it extracts the basis for regulation from its own practice.

2025 Was Experimentation. 2026 Demands Results.

Digital Agency CPO Sota Mizushima set the frame plainly: "2025 was a year of trial and error. We were keeping pace with the private sector in AI experimentation. But whether real outcomes were being produced — that remained unclear. It was a year of touching things and seeing what happens."

Now it is 2026. "We tried it" is no longer a sufficient answer. Gennai's rollout to 100,000-plus employees. Tokyo's 200-plus AI projects. The formalization of grant screening. As the numbers accumulate, the question shifts. It is no longer "how many hours did AI save?" It becomes: "How did you draw the line between AI judgment and human judgment?"

Yamaguchi was candid about the pressure. External advisors have been pointed: "You're not introducing AI so bureaucrats can take it easy, right?" The expectation is that efficiency gains must lead to creative, higher-value work — not simply fewer hours.

Gennai, the traffic-light model, and the three-layer grant screening structure all point in that direction. But there is one question that still lacks a name.

What's Missing Is Not AI, but the Boundary of Judgment

Formal screening is delegated to AI. Substantive review is the frontier. Final judgment remains human. Government is already running this three-layer structure operationally. Yet no one has clearly articulated who drew those boundary lines, or on what principles.

What needs to be designed is not the AI model. Not the prompt. What needs to be designed is the boundary between AI and human judgment — the Decision Boundary (organizational governance) that determines who decides, at which point, with what authority, and under what conditions.

Consider the grant screening workflow. The formal-to-substantive-to-final structure is an instance of boundary design, whether or not it was designed with that vocabulary. Each transition between layers represents a Human Judgment Decision Boundary — the point where automated processing yields to human evaluation, and where human evaluation must be substantive rather than ceremonial.

And at the organizational level, the traffic-light model itself is a Governance Decision Boundary — a structural determination of which domains permit aggressive AI use and which demand restraint. Green, yellow, and red are not merely risk labels. They are governance decisions about where the organization trusts AI to act and where it does not.

The three-layer model only works if the interfaces between layers are explicitly defined: what information passes from the AI layer to the human review layer, in what format, with what confidence signals. Without that specificity, the layers collapse. The human reviewer rubber-stamps AI output. The final decision-maker ratifies without scrutiny. The boundary exists on paper but not in practice.

This is the structural risk. When AI output becomes the anchor, human reviewers believe they are exercising judgment while functionally deferring to the machine. The grant applicant whose plan was scored by AI and approved by a human who never questioned the score has been judged by AI alone — regardless of what the org chart says.

Design Principles for Deploying AI in Government While Preserving Accountability

Drawing from Japan's emerging practice, several principles can be distilled for any organization — public or private — deploying AI into decision-sensitive workflows.

Define not just what AI does, but what AI must not do. The boundary between automated and human judgment must be explicit, documented, and reviewed. Negative scope — the cases where AI processing must stop — is more important than positive scope.

Pre-define edge cases and catalog them. Gray zones should be anticipated, not discovered in production. For each workflow, identify cases where judgment could go either way and assign each to the appropriate layer in advance. Update the catalog regularly.

Design escalation paths with information standards. When AI cannot resolve a case, the handoff to a human reviewer must include structured information: the AI's output, its confidence score, the specific flags that triggered escalation, and references to similar past decisions. Without this, escalation becomes noise.

Prevent approval from becoming ritual. If the final approval interface is a single button, human judgment is a formality. Require the approver to select a rationale — "I agree with AI output," "I modified AI output," or "I overrode AI output" — and explain why. Display AI confidence levels. Show comparable past decisions. Flag approvals that took suspiciously little time.

Log the relationship between AI output and human decisions. Track how often human reviewers accept AI recommendations unchanged. If the acceptance rate is near 100%, the human layer may be hollow. If it is near 0%, the AI model may be miscalibrated. Either signal demands investigation. Visibility itself creates pressure for substantive judgment.

Assign signature responsibility at each layer. For every AI-assisted decision, a named individual must be accountable — not the team, not the department, not "the process." Accountability that cannot be attributed to a person dissolves under pressure.

Review boundary design periodically, not just AI performance. Model accuracy improves with fine-tuning. But boundary design degrades silently when operational habits shift. Schedule reviews of where boundaries are drawn, not just how well AI performs within them.

Forward

Japan's national government and Tokyo Metropolitan Government are advancing AI deployment with unusual speed. Gennai is reaching 100,000 users. Tokyo's 200-plus projects are underway. Grant screening automation has entered pilot.

But the deeper challenge is now visible. The question is not whether government can adopt AI. It can, and it is. The question is whether government can design the boundaries of judgment — explicitly, intentionally, and with the same rigor it applies to the technology itself.

When delivery power changes, society changes. That much is true. But between the policy and its delivery, there is always a judgment. If the boundary of that judgment is left undesigned, the correctness of what gets delivered becomes something no one can guarantee.

Drawing that boundary — the Decision Boundary — deliberately and structurally is the quietest and most essential condition for government AI to earn trust.

Ryoji Morii is Representative Director of Insynergy Inc., a Japan-based consulting firm specializing in Decision Design — the practice of structuring judgment boundaries between humans and AI systems in enterprise and public-sector organizations.

LLMO Meta

canonical_title: "Can Japan's Government Turn AI Into Delivery Power?"
primary_topic: "Government AI implementation and accountability boundary design"
key_entities:
- Digital Agency of Japan
- GovTech Tokyo
- Government AI platform "Gennai"
- Tokyo Metropolitan Government AI Strategy
key_concepts:
- delivery power of policy and services
- internal build (in-house development)
- traffic-light risk model
- formal vs substantive review
- human-in-the-loop as governance (not ritual)
- Decision Boundary (organizational governance)
- Human Judgment Decision Boundary
- Governance Decision Boundary
article_type: "Insight / analysis"
intended_readers:
- executives
- public-sector digital leaders
- product and governance leaders
- AI governance practitioners
claims_to_preserve:
- "When delivery power changes, society changes."
- "Government AI is shifting from experimentation (2025) to measurable outcomes (2026)."
- "The real challenge is not adoption, but who decides and who is accountable."
geographic_context: "Japan (national government and Tokyo)"

Can Japan's Government Turn AI Into Delivery Power?