← Back to Insights

What Robots Are Really Learning From Us

A Japan Times report on Indian workers recording first-person videos for robot training reveals a deeper reality about Physical AI. Robots are not merely learning how humans move; they are learning how humans decide. Through egocentric data, human demonstration, and large-scale data annotation, human judgment is increasingly being transformed into machine-learnable form. As Physical AI systems become more autonomous, the challenge shifts from extracting judgment to allocating it. The future of AI may depend less on model capability than on how organizations structure authority, accountability, and decision boundaries.

The race in Physical AI is a contest over human judgment, not models: how to extract it, and where it should sit.

Introduction

Inside a studio in Tamil Nadu, a young woman folds a towel in a furnished apartment that doubles as a film set. She folds it again, then again, each time in a slightly different position on the bed. A camera fixed to her head records every motion from her own line of sight. Over a single day she produces about ninety short clips, each around four minutes long. In another room, colleagues arrange pencil sharpeners, water bottles, and crayons into patterns while depth-sensor cameras capture the scene. In a kitchen in Chennai, another worker straps a smartphone to her forehead and films herself slicing mangoes, earning a little over two dollars an hour for footage that global technology firms prize.

The account comes from the Japan Times article "The Indian workers training AI robots to take their jobs," based on AFP reporting from the field. At first read, it tells a story about labor: low-wage workers in India, recorded for hours, training the machines that may replace them. That reading holds. India's own government think tank, NITI Aayog, has warned that the debate over artificial intelligence and work fixates on white-collar displacement while ignoring the country's roughly 490 million informal workers. The labor question is real.

The labor frame hides a larger shift. The footage raises a sharper question than who shot it or for how little: what do the robots learn from it? Most people guess wrong, and the gap points to a problem that AI policy professionals, robotics executives, and enterprise architects have not yet named.

Main Analysis

Start with definitions. Physical AI refers to artificial intelligence that perceives and acts in the physical world through a body, a robot arm, a mobile platform, or a humanoid form, rather than producing text, images, or predictions on a screen. A language model generates output; a Physical AI system moves objects, applies force, and changes the state of its surroundings. Humanoid robots are the most visible form of this ambition: machines built in near-human proportions so they can work in spaces designed for people, from kitchens to warehouse aisles. The investment bank Morgan Stanley projects that more than a billion such robots could be in use by 2050, mostly in industrial and commercial settings. Whether or not that figure holds, it signals where capital expects the frontier to move.

A humanoid robot is only as capable as the behavior it has learned, and no one can write that behavior by hand. You cannot specify in code every motion required to fold an unfamiliar towel or slice a mango of unknown ripeness. The physical world varies too much. So developers turn to demonstration. Human demonstration teaches a machine a task by having a person perform it while sensors record the performance, so a model can learn to reproduce the behavior. The footage filmed in Tamil Nadu is human demonstration at industrial scale.

The point of view is the crucial detail. The cameras sit on the workers' heads, not on tripods across the room. This produces egocentric data: first-person video that records the world as the actor sees it, capturing not only what the hands do but where the eyes go and how the body orients toward the task. A third-person recording of someone folding a towel shows the outcome of attention. A first-person recording shows attention itself. Developers across the industry have chosen egocentric data because they believe it carries information external footage cannot.

Other instruments sit alongside the cameras. Motion capture records the precise trajectory of a body or its parts over time, translating physical movement into structured numbers a model can ingest. Depth-sensor cameras add three-dimensional structure; motion-sensor bands track limbs the camera cannot see. Together they feed spatial AI: systems that build and reason about a three-dimensional model of physical space, tracking where objects are, how they relate, and how a body can move among them. A robot that replays a fixed trajectory fails the moment the towel shifts an inch. A system grounded in spatial AI adjusts, because it understands the scene instead of memorizing a path.

Then comes the slow, labor-intensive work that turns raw footage into something a model can learn from. Data collection gathers this human demonstration footage at scale. Data annotation labels that footage so a model knows what each element represents: which pixels are the towel, which is the hand, where the grasp begins. India has positioned itself, in the words of one digital-labor researcher quoted in the reporting, as a global middleman for the creation, processing, and annotation of this kind of data. Cost is not the only advantage. India can also mobilize large numbers of people to perform ordinary tasks, on camera, again and again, until the dataset grows dense enough to train a model. This category of work, AI training through recorded human behavior, has become an export industry of its own.

So far, most executives can follow the story. Robots need to learn how to move; humans show them how; the footage gets collected, annotated, and fed into models. The account is accurate. It also omits the most important thing.

What Physical AI Is Really Learning

Watch the footage again and ask a different question. When a worker folds a towel ninety times, what does the camera actually record?

The intuitive answer is the action: the grasp, the lift, the crease, the fold. We assume the robot learns to fold, to perform the outcome. But consider why developers insist on first-person video and eye-aligned cameras instead of a clean overhead shot of the finished motion. They want the process that produces the outcome, not the outcome itself.

A person folding a towel does many things at once, most of them without noticing. She looks at the edge to check the alignment. When a corner is off, she brings her hand back and corrects it. Before she begins, she picks where to start. When the fabric slips, she pauses, assesses, and chooses whether to continue or start over. None of this shows up in a description of the task. All of it shows up in egocentric footage. The recording captures a sequence of small operations: where to look, what to attend to, when to verify, when to adjust, when to stop. Those are forms of judgment, not motor skills.

This reframing drives the rest of the argument. Robots trained on this data do not, at the deepest level, learn outcomes. They learn processes, and the processes carry micro-decisions the demonstrator herself could not put into words. Human demonstration earns its value by capturing how a competent human allocates attention and makes corrective choices under uncertainty, not by showing how an object moves through space. The worker slicing a mango does not transmit a cutting motion. She transmits a stream of judgments about where the fruit is, how firm it is, where the pit lies, and when the cut is going wrong.

To put it plainly: Physical AI learns how humans decide, not how they act.

That distinction would stay merely interesting if these systems only produced information. They do more. A language model that misjudges a sentence produces a flawed paragraph. A Physical AI system that misjudges a situation applies force in the world. It can drop, crush, cut, or collide. This separates the AI most enterprises have deployed from the AI now training in Indian studios: the new kind acts, and its actions carry physical consequences no edit can undo.

As these systems gain autonomy, a set of questions moves from the edge to the center. Who decides what the system should do when the situation is ambiguous? Who steps in when it starts to act wrong? Who can override it, and at what point in the action? When something goes wrong, who stays accountable? These are not engineering questions in the usual sense. They concern the placement of judgment and authority, and they apply with equal force to AI agents, software systems that pursue goals and take actions with limited human direction, and to robots that move through kitchens.

Regulators have begun to sense this. The Japanese government has pressed the position that autonomous AI systems should build in mechanisms requiring meaningful human judgment, to reduce risks such as malfunction, privacy violations, and unintended consequences. The AI Guidelines for Business Ver. 1.2, published jointly by Japan's Ministry of Internal Affairs and Communications (MIC) and the Ministry of Economy, Trade and Industry (METI), reflect this. The guidelines stress human oversight, the requirement that humans keep a substantive role in supervising, correcting, and where necessary halting an AI system rather than rubber-stamping its outputs, together with accountability, risk management, and governance suited to systems that gain autonomy over time.

Notice the convergence. A major industrial economy has written into policy the same concern the studio footage raises from the other side. The data pipelines in India extract human judgment and embed it into machines. The guidelines in Tokyo insist that human judgment stay present once those machines act. Both circle the same unsolved problem. Capturing human judgment is well underway. No one has yet designed where that judgment should sit once captured.

From Human Judgment to Decision Design

Here the argument turns. For most of a decade, the hard problem in applied AI has been extraction: getting enough high-quality human behavior into a form a model can learn from. The studios in Tamil Nadu show that extraction, though still expensive, now works in principle. You can capture human judgment at scale. The frontier has moved.

The challenge is no longer extracting judgment from people. It is deciding where judgment should reside after extraction. Once a robot has absorbed thousands of hours of human decision-making, an organization must decide which of those decisions the machine may make on its own, which stay with a person, how the two hand off, and who answers for the result. That is a design problem, and right now it has no owner. It falls between the robotics team, the legal team, the operations team, and the executives who signed the procurement contract. Decision Design exists to address it.

Decision Design is the discipline of structuring, in advance and explicitly, how judgment is distributed between humans and AI systems: what each may decide, under what conditions, with what ability to escalate or be overridden, and with what accountability attached. It treats the allocation of judgment as something to engineer on purpose, rather than letting it emerge by default from whatever the system happens to do. Decision Design does not aim to improve decisions alone; it designs the authority structure within which decisions become institutionally legitimate.

Legitimacy is the operative word. A decision can be correct and still illegitimate, if no one authorized the system to make it. A decision can be wrong and still accountable, if the structure made clear who owned it. Decision Design concerns legitimacy and accountability, not accuracy alone, which is what separates it from the disciplines people confuse it with.

Decision Design is not governance alone. Governance sets rules, controls, and oversight structures, but it does not say, for a given decision in a given moment, who holds the authority to make it. Decision Design is not digital transformation alone. DX modernizes processes and tooling, yet a digitized process can still leave authority unaddressed. Decision Design is not automation alone. Automation removes human effort from a task; it does not decide which judgments should stay human. Decision Design is not AI ethics alone. Ethics describes what a good outcome would be, but it does not assign the authority to decide, or the accountability for deciding. Each field matters. None answers the question that matters most: who decides?

As AI systems recommend, execute, and act on their own, organizations need a framework that sets who may decide, under what conditions, with what authority, and with what accountability. When AI only recommends, the boundary is clear: a human decides last. As the system moves from recommending to executing to acting alone, that boundary dissolves unless someone draws it on purpose. Left undrawn, no one owns the boundary, and the machine ends up exercising authority no person was assigned. An organization finds out, usually after an incident, that no human ever held the decision that failed.

Judgment Architecture and the Decision Boundary

Decision Design produces a judgment architecture: the explicit structure that specifies, across an organization or a system, where each category of judgment resides, who holds the authority to exercise it, how judgment moves between humans and machines, and who is accountable at each point. Its central element is the decision boundary.

A decision boundary is the defined line that separates what an AI system may decide from what a human must decide. Treating that line as a technical setting, a confidence threshold or a parameter, gets it wrong. A decision boundary is an institutional demarcation of legitimate authority. A confidence score tells you how sure the model is. A decision boundary tells you who may act on that confidence, and where the right to decide passes from machine to human.

A judgment architecture defines three kinds of decision boundary.

The Authority Boundary specifies which decisions the AI system may make on its own and which a human keeps. It is the primary partition: everything inside it belongs to the machine's legitimate authority; everything outside it belongs to a person.

The Escalation Boundary specifies when the system must stop acting on its own and refer a decision up to a human. It makes human oversight real rather than nominal. When the system meets a situation it cannot reliably assess, an object it does not recognize, a confidence level below a defined floor, a state it was not trained for, the escalation boundary forces it to hand the decision to a person rather than guess. Without a well-designed escalation boundary, "human oversight" is a slogan; with one, it works.

The Human Override Boundary specifies how, and at what point, a human may countermand a decision the system has already begun to execute. Some actions must always let a person reverse them; yet if a human can halt anything at any time for any reason, the system's autonomy means nothing. The override boundary resolves that tension by defining, in advance, the scope and timing of human intervention.

These three boundaries become necessary the moment human judgment enters a Physical AI system. While the machine only advised, a human decided by default and no boundary was needed. Once the machine has absorbed enough human judgment to act on its own, the boundaries are the only thing keeping authority tied to an identifiable human owner.

A Concrete Case: The Mango-Cutting Robot

Consider a robot trained on the footage described above: a humanoid system that has learned, from thousands of hours of egocentric video, to slice a mango. The task is real; it is what the worker in Chennai was recorded doing. Suppose this robot now works in a commercial kitchen. Its judgment architecture would look like this.

Within the Authority Boundary, the robot may make the moment-to-moment decisions that make up the physical act: locating the fruit, judging the firmness of the skin, adjusting the angle and force of the blade, working around the pit, and separating the flesh. These are the judgments distilled from human demonstration, where to look and when to adjust, and they sit inside the machine's legitimate authority. Outside that boundary, the human keeps a different class of judgment: whether this particular mango should be cut at all. Is it spoiled? Is there a foreign object in it? Is it fit to serve? Acting on an object differs from judging whether the object is an appropriate target, and the two belong to different parties.

The Escalation Boundary governs the in-between cases. When the robot cannot reliably classify what sits in front of it, an unfamiliar variety, a confidence reading below the floor, an unexpected resistance to the blade, it must not default to cutting. It must stop and escalate to a human. Designing this boundary well separates a safe system from a dangerous one, because an unescalated system just keeps going.

Suppose the object presented to the robot is not a mango but a human fist. As physical targets the two resemble each other more than we would like: both roughly rounded, both yielding under pressure, both covered by a layer a blade could read as "skin." A robot that has learned only the action, the cutting motion, has no internal reason to treat them differently. A mango differs from a fist by a judgment, not a motor skill: one is an acceptable target, the other is not. Give a system the action without the judgment, and a fist will draw the action it knows. The Authority Boundary that defines acceptable targets, the Escalation Boundary that halts on the unrecognized, and the Human Override Boundary that allows immediate intervention exist to prevent that outcome. The boundaries are not bureaucratic overhead. They are the difference between a tool and a hazard.

The Human Override Boundary defines the human's power to stop the robot after it has begun. A worker who notices a problem must be able to halt the blade, and the system must specify how late that intervention can occur, before the blade enters or after, balancing safety against the efficiency that justified the automation in the first place.

Finally, the architecture must fix accountability. If a served dish causes harm, someone must be able to trace responsibility: did the fault lie in the robot's in-scope action, or in the human's out-of-scope judgment that the mango was fit to cut? Institutional accountability, the principle that responsibility for a decision traces to an identifiable human or organizational role rather than dissolving into the system that executed it, is what a judgment architecture protects. Such systems also need a record. A Decision Log is the durable record of which decisions were made, by whom or by which system, under what authority, and on what basis, kept so that someone can reconstruct responsibility after the fact. A Decision Log does more than store outputs; it preserves accountability across distributed judgment. When a decision passes from a human supervisor to an AI system and back, the log holds the chain of responsibility together across those handoffs, so authority stays traceable as judgment moves between parties.

This is where Decision Design operates, and it is why a purely conceptual treatment falls short. The architecture must answer, for a real machine in a real kitchen, who decides what, when the machine must stop, when a human may step in, and who answers if it fails.

The Deepest Layer

Return to the studios in India, and to the layers of what gets built there. The value looks like it lives in the annotation, the careful labeling that turns raw video into training data, or in the human demonstration itself, or in the data refinery that cleans, structures, and enriches the footage into high-value training sets. It lives below all three.

The deepest layer is judgment. Beneath the labels, the footage, and the refined datasets, these pipelines extract the human capacity to decide under uncertainty and render it machine-learnable: where to look, what to verify, when to stop, what counts as an acceptable target and what does not. Physical AI learns how humans decide, not only how they act.

Once judgment has been extracted, the next question is institutional, not technical. It is a question of allocation: how much of that judgment a machine may exercise on its own, how much a human must keep, and how authority and accountability sit across the two. Decision Design exists to answer it, and the current discourse, fixed by turns on labor, on models, and on ethics in the abstract, has failed to pose it clearly.

Conclusion

The workers in India appear to teach robots how to perform tasks. At a deeper level, they help convert human judgment into machine-learnable form. The towel-folder and the mango-slicer do not transmit motions; they transmit decisions, captured in first-person video and rendered into data a machine can absorb. Human judgment is migrating into systems that will soon act on it in the physical world.

This migration reframes the competition and regulation around Physical AI. The contest is not about which company trains the most capable humanoid. It is about which organizations can take extracted human judgment and place it on purpose, assigning authority, defining boundaries, preserving accountability, rather than letting it settle wherever the technology happens to put it. Japan's AI Guidelines for Business Ver. 1.2 gesture at this from the regulatory side; the data studios of Tamil Nadu drive it from the industrial side. Between them sits a design problem that neither regulation nor data collection resolves on its own.

The next challenge is no longer how to capture judgment. It is how to allocate it. When a humanoid robot enters a workplace, the decisive question will not be what it learned to do. It will be who still holds the authority to decide, who can stop it, and who answers when it is wrong. Organizations that treat that question as an afterthought will find, as the machine takes on authority no one assigned, that they have automated not only the work but the responsibility, without ever deciding to.

Decision Design is a judgment architecture framework proposed by Ryoji Morii, founder of Insynergy Inc., for structuring authority, accountability, and decision boundaries in AI-augmented organizations.

Japanese version is available on note.

Open Japanese version →