Physical AI and the Judgment Layer- Why Japan's Robotics Strategy Is an Institutional Challenge, Not a Modeling One

At Humanoids Summit Tokyo, METI’s Toshikazu Okuya outlined Japan’s emerging Physical AI strategy—one built on industrial robotics, manufacturing expertise, Data Refinery, and Robotics Foundation Models. But beyond models and datasets lies a deeper challenge: institutional judgment. As autonomous systems move from recommendation to action, organizations must determine who decides, who intervenes, and who remains accountable. This article explores why Physical AI is ultimately not only a technological challenge, but an institutional one—and why Decision Design may become a critical governance framework for the next generation of autonomous systems.

Japan's national robotics strategy, articulated at Humanoids Summit Tokyo, points toward a question that model capability alone cannot answer: who decides?

Introduction

There is a moment in most discussions of artificial intelligence where the conversation quietly changes shape. It begins as a question about capability — how good is the model, how large is the dataset, how fast is the hardware — and ends, almost without anyone noticing, as a question about responsibility. Who is accountable when the system is wrong? Who has the authority to stop it? That second question rarely gets the attention the first one does, partly because it is harder, and partly because it does not improve with a larger model.

The keynote titled "AI Robotics Policy in Japan," delivered by Toshikazu Okuya of Japan's Ministry of Economy, Trade and Industry (METI) at Humanoids Summit Tokyo, was on its surface an account of the first kind of question. It described, with unusual candor for a government official, how Japan has assembled its artificial intelligence and robotics policy over the past several years: the scramble for computing power, the decision to build a domestic foundation model, and the deliberate national turn toward what is increasingly called Physical AI — artificial intelligence that does not merely generate language but perceives, manipulates, and operates within the physical world.

Yet listening to the strategy as a coherent whole, a different conclusion emerges than the one the technical narrative seems to point toward. Japan is not simply trying to build better robots or better models. It is trying to reindustrialize around systems that increasingly act on their own. And the binding constraint on that effort is not model capability. It is institutional judgment — the question of who decides, under what conditions, and with what accountability, once an autonomous system is loose inside a real production environment.

This essay traces that argument in three movements. First, what METI is actually attempting to build, and why. Second, why Japan's competitive position in Physical AI rests on something Silicon Valley cannot easily replicate. And third, the structural problem that the entire strategy will eventually run into — a problem of authority rather than intelligence — and the kind of framework organizations will need to solve it. This challenge is not merely technological. It is ultimately a question of how institutions design judgment itself.

Main Analysis: What METI Is Building, and Why

To understand Japan's Physical AI strategy, it helps to start where Okuya started, which was not with robots but with a shock.

When ChatGPT arrived at the end of 2022, the immediate effect on Japanese policymakers was not philosophical but logistical. The first obstacle was not how to regulate generative AI or how to think about its ethics. It was that the computing power required to develop large models — the sheer volume of accessible GPUs — was concentrated in the hands of a small number of foreign developers. As Okuya described it, the true starting point of Japan's AI policy was therefore not a doctrine but a supply problem: securing access to computation.

The response was concrete. METI moved to make GPU capacity available to Japanese firms, working through domestic infrastructure providers such as Sakura Internet, KDDI, and SoftBank, while also encouraging access to the cloud platforms of Microsoft and Google so that Japanese startups could train models without being permanently disadvantaged on hardware. In 2023, the ministry launched a national program to channel computing resources toward promising startups, including frontier-oriented companies in the mold of Preferred Networks. The logic was straightforward. Before a country can have an AI industry, it needs the means of production, and in this case the means of production was compute.

That immediately raised a more strategic question, one that Okuya said provoked significant internal debate over the past year: should Japan continue to build on rented foundations, or should it construct its own foundation model? A foundation model is a large, general-purpose system trained on broad data that can then be adapted to many downstream tasks; owning one is expensive and risky, but renting one indefinitely is a different kind of risk. The conclusion, reached only after weighing the considerable cost, was that Japan should build its own. The ministry is now preparing to launch a major project to develop a domestic multimodal foundation model — multimodal meaning a system that learns from more than text, incorporating images, sensor readings, and other forms of data.

So far, this is a recognizable arc. Many governments have made versions of the same journey, from compute anxiety to foundation-model ambition. Where Japan's strategy becomes distinctive is in where it points the resulting capability. Rather than competing primarily in the domain of conversational and knowledge work, where the leading laboratories are already entrenched, METI has chosen to concentrate on Physical AI.

The reasoning is industrial. Japan possesses a deep and durable manufacturing base, including some of the world's strongest Industrial Robotics suppliers — Fanuc, Yaskawa, and Kawasaki Heavy Industries among them. The concern Okuya articulated is that if this manufacturing strength cannot be fused with modern AI, the advantage could erode as the industrial structure is reshaped by automation. The opportunity, conversely, is that if the fusion succeeds, Japan could lead in precisely the domain where its existing capabilities matter most. To that end, the strategy includes the development of a Robotics Foundation Model — a foundation model trained not on text scraped from the open internet but on the dynamics of physical motion and machine behavior, intended to serve as a general substrate for robotic systems. Okuya noted that at least one consortium had begun building robotic-motion datasets in 2025, with an early robotics model potentially released within the following year, and that recent policy had expanded to support not only models but the underlying hardware stack for robotics as well.

Why Japan's Advantage Is Not Silicon Valley's

The most analytically interesting part of the keynote was the explanation of why this strategy cannot be executed the way text-based AI was executed — and why that difference is, paradoxically, Japan's strongest card.

Text and image models improve largely by ingesting enormous quantities of openly available data. Physical AI cannot work this way. The data that matters — the readings, signals, and recorded behaviors of machines, production lines, and test environments — is specific, proprietary, and scarce. It does not exist on the open internet, and even when it can be collected, it is not immediately usable. A stream of vibration measurements from a motor is not knowledge. It becomes knowledge only when someone can say what is normal, what is anomalous, what signals genuine degradation and what is mere noise.

This is where Okuya placed his strongest emphasis, on a concept he treated as the decisive bottleneck: Data Refinery. Data Refinery refers to the process of transforming raw physical data into structured, annotated, machine-learnable form — the work of labeling, interpreting, and giving meaning to signals so that a model can learn from them. Its defining characteristic is that it cannot be done well without deep domain experience. To annotate physical data correctly requires Manufacturing Experience and what might be called Veteran Skill: the accumulated, often tacit knowledge held by experienced engineers and technicians who can read a machine's behavior the way a clinician reads a patient. Without that experience embedded in the refining process, even the largest and best-resourced AI developer cannot construct usable datasets for the physical domain.

Okuya offered a telling illustration. Japanese firms have built sophisticated models capable of identifying where a fault or repair point lies in complex equipment, and the diagnostic capability of such a model approaches that of a seasoned expert. The point was not that the model is impressive in isolation. The point was the dependency. The model's competence is a downstream product of Data Refinery, which is itself a downstream product of human Veteran Skill. The chain runs from experienced people, through the refinery process, to the model — and the scarce, defensible input is at the top of that chain, not the bottom. This is why several Japanese firms and consortia have begun cooperating to build shared physical datasets, often partnering with startups that specialize in the craft of data design.

The strategic implication is significant and easy to miss. In the conversational AI race, advantage accrues to whoever can marshal the most compute and the broadest data. In Physical AI, advantage accrues to whoever holds the deepest reservoir of physical-world experience and can refine it into data. That is a contest in which an aging industrial economy with decades of accumulated Manufacturing Experience is unusually well positioned — provided it can transfer that experience into systems before the experts who hold it retire. Japan's broader Industrial Robotics strategy, Okuya noted, is being assembled not by METI alone but across multiple ministries, and is driven in no small part by a severe and worsening labor shortage in sectors such as manufacturing and construction. Robots, in this framing, are not a futuristic indulgence. They are a response to demographic reality.

The Hidden Challenge of Physical AI

Everything above concerns how to build Physical AI. It is a strategy of compute, models, data, and experience, and on those terms it is coherent and well-reasoned. But there is a second question that the technical narrative leaves almost entirely open, and it is the question that surfaces the moment one imagines these systems actually running.

A conversational model produces text. When it is wrong, the cost is usually an incorrect answer that a human can catch, ignore, or correct. Physical AI is categorically different. It does not only generate; it acts. It moves equipment, manipulates objects, and intervenes in physical systems whose failure modes include damaged machinery, halted production, and physical danger to people. The consequences of a mistaken judgment do not stay on a screen.

This difference becomes sharper as systems become more autonomous. The trajectory described in METI's strategy is one in which AI moves from detecting conditions, to recommending actions, to eventually executing them with limited human involvement — the pattern often described under the heading of Agentic AI, meaning systems that can pursue goals and take actions across multiple steps with reduced human prompting at each one. As that autonomy increases, a cluster of questions becomes unavoidable. Who decides whether to act on the system's recommendation? Who has the authority to intervene while it is operating? Who can override it, and under what conditions? And when something goes wrong, who remains accountable?

These are not questions about model accuracy. A more capable model does not answer them; in some respects, a more capable model makes them more urgent, because the more reliable a system appears, the stronger the temptation to defer to it without a defined structure for doing so. They are questions about AI Governance in its most practical sense — not the high-level principles, but the operating architecture of who is permitted to decide what.

Japanese policy has, to its credit, begun to name this concern, though it has not yet fully resolved it. The government has increasingly emphasized that autonomous AI systems should incorporate mechanisms requiring meaningful human judgment, in order to mitigate risks such as malfunction, privacy violations, and unintended consequences. This principle is reflected in the AI Guidelines for Business Ver. 1.2, jointly published by Japan's Ministry of Internal Affairs and Communications (MIC) and the Ministry of Economy, Trade and Industry (METI). The guideline explicitly emphasizes Human Oversight, accountability, risk management, and appropriate governance structures for increasingly autonomous systems. Human Oversight, in this context, is not a slogan but a design requirement: the expectation that a person retains a defined and effective role in the loop of consequential decisions.

The guideline is correct in its emphasis, but emphasis is not architecture. To say that human judgment must remain meaningful is to state a goal. It does not specify which decisions require it, at what threshold a system must defer to a person, who that person is, or how the resulting accountability is preserved over time. The gap between the principle of Human Oversight and its concrete implementation is exactly the space where Physical AI strategies will succeed or fail operationally. And it is a gap that neither model development nor conventional governance language fills. This is the hidden challenge of Physical AI: the engineering of who decides has not kept pace with the engineering of what the system can do.

Interestingly, this concern is no longer theoretical. Japan's AI Guidelines for Business Ver. 1.2 already assumes that increasingly autonomous systems require meaningful human involvement in consequential decisions. The policy question is gradually shifting from whether humans remain involved to how that involvement should be structured.

Decision Design and Physical AI

The missing element has a name. It is Decision Design — a framework that treats the act of judgment itself as a deliberate object of design, rather than something left to emerge informally from workflows, interfaces, and individual discretion.

The framework discussed here is called Decision Design, a judgment architecture framework proposed by Ryoji Morii. It emerged from a simple observation: organizations increasingly know how to design systems, but far fewer know how to design judgment. As AI systems become more autonomous, judgment becomes a design problem.

Decision Design begins from a premise that sounds obvious but is rarely operationalized: in any environment where artificial intelligence and human beings share responsibility for consequential decisions, the allocation of judgment should be designed explicitly. Concretely, Decision Design is concerned with four things. The first is judgment itself — which decisions are being made, on what basis, and against what criteria. The second is authority — who, human or machine, holds the right to make a given decision, and how that right is divided when it is shared. The third is accountability — who bears responsibility for the outcome of a decision, recognizing that when authority and accountability diverge, organizations malfunction. The fourth is escalation — how a decision moves from machine to human, or from one level of human authority to another, when conditions require it.

It is equally important to be precise about what Decision Design is not, because it is easily confused with adjacent practices that organizations already understand. Decision Design is not workflow design; it does not describe the sequence of operational steps. It is not user interface design; it is not concerned with the usability of a screen. It is not model optimization; it does not make an AI system more accurate. And it is not decision-support software; it is not a tool that surfaces information to help a person decide. Each of those disciplines is legitimate and valuable, but none of them specifies where legitimate authority lies. A workflow can be flawless, an interface elegant, a model precise, and the question of who is genuinely entitled to make the final call can still be entirely undefined.

This distinction matters because of the specific problem Decision Design exists to address. As AI systems become increasingly autonomous, organizations need a framework for determining who may decide, under what conditions, with what authority, and with what accountability. Without such a framework, autonomy does not eliminate the decision; it merely obscures who made it. After an incident, the absence of designed authority produces a familiar and corrosive pattern — the system appeared to decide, a person appeared to approve, and no one can say with confidence where responsibility actually rested.

Here the framework's central claim must be stated without softening. Decision Design is not about improving decisions alone; it is about designing the authority structure within which decisions become institutionally legitimate. The emphasis on legitimacy is deliberate. A decision can be correct and still be illegitimate, in the sense that the entity making it had no defined right to do so; and a decision can be wrong yet legitimate, in the sense that it was made by the properly authorized party under the agreed conditions. Institutions run on the second kind, not the first. This is why Decision Design treats Institutional Accountability — the durable assignment of responsibility within an organization, independent of any individual — as the property it is ultimately trying to protect.

The operational instrument of Decision Design is the Decision Boundary. A Decision Boundary is the defined line between what an AI system is permitted to decide and what a human being must decide, expressed not as a vague preference but as an enforceable specification. In practice, a system of Decision Boundaries is built from at least three distinct kinds of line. The first is the authority boundary, which defines the upper limit of what the AI may decide on its own — recommendation only, execution within limits, or independent action. The second is the escalation boundary, which defines the conditions under which the system must stop deciding and hand the judgment to a person, such as when its confidence falls below a set level or when the potential impact exceeds a defined scope. The third is the override boundary, which defines the conditions under which a human may countermand the system's decision, and crucially, who holds that power and how the act is recorded.

That phrasing is precise for a reason. Decision Boundaries are not operational thresholds; they are institutional demarcations of legitimate authority. The difference is more than semantic. An operational threshold is a number tuned for performance — a confidence level set to balance false positives against false negatives. A Decision Boundary, by contrast, encodes a question of legitimacy: it states who is entitled to act once that number is crossed. Two organizations might choose the same numerical threshold for entirely different reasons, and might assign the resulting authority to entirely different parties. The boundary is where engineering and institutional design meet, and treating it as merely a tuning parameter is how organizations end up with technically functional systems that no one is accountable for. A Decision Boundary does not attempt to influence behavior. Its purpose is to define legitimate authority. The question is not what choice should be encouraged. The question is who is entitled to decide. Unlike workflow gates, Decision Boundaries are not primarily concerned with process control. Their purpose is to establish legitimate authority before a decision occurs and preserve accountability after it.

The third instrument completes the structure: the Decision Log. A Decision Log is a durable record of consequential decisions — what was decided, by which party, under which conditions, and on what basis — maintained so that authority and responsibility can be reconstructed after the fact. Its purpose is not operational telemetry. Decision Logs do not merely record outputs; they preserve accountability continuity across distributed judgment processes. In an environment where a single outcome may result from a chain of partial judgments — a model's detection, an escalation to a supervisor, a manager's authorization, an automated execution — the Decision Log is what allows an organization to answer, months later, the question that matters most: who was responsible, and were they entitled to be. Without it, accountability decays as the decision passes through hands, until no one can say where it actually lay.

A Concrete Implementation: Industrial Maintenance AI

Abstractions about authority are easy to assent to and hard to apply, so it is worth grounding the framework in a specific case. Consider an Industrial Maintenance AI deployed on a manufacturing line — precisely the kind of system described earlier, one whose diagnostic capability was built by refining the Veteran Skill of experienced technicians into trainable data through a Data Refinery process.

The domain of AI authority can be drawn generously, because this is where the system genuinely excels. The AI detects early signs of anomaly from sensor data, predicts the progression of wear, estimates the remaining service life of components, and proposes likely causes by matching current readings against historical failure patterns. It performs this faster, more consistently, and more tirelessly than any human monitor. Within these bounds, the system may act on its own — for instance, by adjusting monitoring frequency or flagging components for inspection — because the consequences of error are recoverable and the authority is appropriately limited.

The domain of human authority begins precisely where the consequences stop being recoverable. The decision to halt a production line is not a technical one; it is a judgment that balances the cost of lost output, missed deadlines, and customer commitments against the risk of equipment failure or harm. That trade-off lies outside the system's competence, not because the model is weak but because the decision integrates considerations the model was never authorized to weigh. The authority boundary therefore places the line-stop decision with a human supervisor: the system may recommend stopping, but the authority to stop, in normal operation, remains with a person.

The escalation conditions make the handoff explicit rather than implicit. When the system's confidence in an anomaly prediction falls below a defined threshold, or when the predicted impact extends beyond a single piece of equipment to multiple downstream processes, the system does not proceed on its own. It escalates the judgment to the responsible maintenance authority. The correct behavior of the system, in these cases, is to decline to decide — a design choice that runs against the instinct to maximize automation, but that is essential to keeping authority where it belongs.

The override conditions define the rare exception in which the AI is granted limited executive power. If the system detects a hazard that exceeds a hard safety limit — an over-pressure event, an incipient fire — it may trigger an automatic emergency stop without waiting for human approval, because the cost of delay exceeds the cost of a false stop. This is a deliberately narrow grant of authority, and it is paired with a defined override path: the conditions under which a human may release or countermand that automatic stop, and the requirement that the act be recorded.

Finally, the accountability structure binds the whole arrangement together through the Decision Log. Every consequential decision in this system — the AI's anomaly detection, its escalation, the supervisor's authorization, the automatic emergency stop, the human override — is recorded with its conditions and its acting party. The result is that responsibility does not dissolve into the system. Months after an incident, the organization can reconstruct not only what happened but who was entitled to decide at each step, and whether the boundaries held. What began as "we installed an AI" reveals itself, on inspection, to be a dense set of designed Decision Boundaries: how far the system may go, where it must hand off, when a person must intervene, and how authority is preserved in the record. The Data Refinery transferred expert judgment into the model. Decision Design did something different and equally necessary — it placed authority and accountability back into the organization.

Conclusion

It would be a mistake to read METI's program, as articulated in Toshikazu Okuya's keynote on AI Robotics Policy in Japan at Humanoids Summit Tokyo, as merely a robotics strategy. The procurement of computing power, the construction of a domestic foundation model, the Data Refinery process, the Robotics Foundation Model, and the broader Industrial Robotics push are real and substantial. But taken together, they describe an entrance rather than a destination. They are the opening moves of an attempt to reindustrialize an advanced economy around systems that increasingly act on their own.

In that environment, the organizations that hold a durable advantage will not simply be those with the most capable models. The long-term challenge is not building smarter systems; it is building organizations capable of governing increasingly autonomous ones. That capability is institutional, not technical. It depends on whether an organization can state, clearly and enforceably, who may decide, where the line falls between machine and human judgment, when a system must defer, who may override it, and how accountability is preserved as decisions move through the structure. Japan's Physical AI strategy is, in the end, an institutional strategy in the guise of a technological one — and the institutions that learn to design judgment will be the ones that can deploy autonomy safely enough to deploy it boldly.

Physical AI will ultimately be constrained less by what machines can do than by what institutions are prepared to authorize.

The challenge, therefore, is not simply to build systems that can act. It is to build institutions that know when systems may act, when they must defer, and who remains accountable when they do. Physical AI may become one of the defining industrial transformations of the coming decade. The organizations that succeed will not merely possess better models. They will possess better judgment architectures.

Decision Design is a judgment architecture framework proposed by Ryoji Morii, founder of Insynergy Inc., for structuring authority, accountability, and decision boundaries in AI-augmented organizations.