When Steel Learns to Dream
On robots, imagination, and the strange new physics of embodied intelligence
Sajad Saleem
the mediocre generalist
There's a factory floor in Toyota's research facility in Los Altos where a robot arm is doing something that, if you watch carefully enough, might qualify as dreaming.
It's not moving. That's the first thing you notice. Perfectly still. Servos locked, cameras active but tracking nothing in particular. Inside the neural network that governs its behaviour, the model is running forward simulations — imagining, if you'll permit the word, what would happen if it reached for the cup on the table at slightly different angles, with slightly different grip pressures, in a room that's slightly different from the one it's actually in.
Generating internal models of the world. Testing them against physics. Discarding failures. Keeping what works.
When it finally moves — when it reaches out and picks up that cup with a fluidity that looks almost casual, almost bored — it moves with the confidence of something that has already done this a thousand times in a world that exists only inside its own architecture.
I don't know if that's dreaming. I'm not sure anyone does. But it's closer to dreaming than anything a machine has done before, and I think it deserves our attention. Also, possibly, our mild unease.
The convergence nobody predicted (except science fiction, which predicts everything and gets credit for none of it)
For decades, robotics and AI were separate fields. They shared conferences occasionally, the way cousins share family gatherings — politely, with a vague sense of obligation and not much actual collaboration.
Robotics was about control systems, kinematics, sensor fusion, mechanical engineering. Physical problems: how do you make a joint that doesn't wear out? How do you build a hand that can grasp an egg without crushing it? How do you stop a bipedal machine from falling over on uneven ground? The kind that consume careers and produce PhDs and occasionally cause grown engineers to weep quietly at their desks.
AI, meanwhile, was off in its own world. Language models. Image classifiers. Game-playing systems that could beat humans at Go and chess and Dota 2. Impressive but disembodied. Intelligence living in data centres, in the abstract space of tokens and embeddings and attention heads. It could think, after a fashion. But it couldn't do anything. Not in the physical world. Not with hands. All brain, no body. Like a very clever ghost.
Then, roughly between 2023 and 2025, the two fields collided. Not gently. They collided like tectonic plates, and the results are still playing out.
The key insight — obvious in retrospect, as all key insights are — was that language could be the interface for physical intelligence. You don't need to hand-code every movement. You can describe the task in natural language, let a foundation model reason about it, and generate motor plans that translate intent into action. Language isn't just for talking. It's for moving. Every imperative sentence is, at some level, a motor command. We've spent three thousand years treating language as the thing that separates mind from body. Turns out it's the bridge.
Sim-to-real, or: rehearsing in imaginary worlds
Sim-to-real transfer: training a robot entirely in simulation — a virtual world with virtual physics, virtual objects, virtual friction and gravity and light — then deploying the trained model on a physical robot in the actual world. The robot has never touched a real object. Never felt resistance against its actuators. Never experienced the messy, noisy, unpredictable business of physical existence. And yet, in a real room with real objects, it works.
Not perfectly. Not yet. But well enough to be useful, and improving at a rate that makes "well enough" a temporary condition.
The gap between simulated and real experience — what researchers call the sim-to-real gap — has narrowed to the point where a model trained entirely in simulation can generalise to reality with only minor fine-tuning. The dream and the waking world have grown close enough to rhyme.
Think about what this means. A robot can learn to manipulate objects, navigate spaces, perform complex tasks without ever existing in the physical world during training. It learns in a dream. I tried to explain this to a friend once. He asked if it was like practising piano in your head. I said yes, except the piano is a cup and the head is a GPU cluster in a data centre. He nodded as though this were perfectly normal, which made me question whether I was explaining it well or whether he'd simply stopped listening.
The philosophical implications are worth sitting with, and I don't think the robotics community has fully grappled with them. If a machine can learn to interact with the world by simulating the world internally, what is the meaningful difference between that and imagination? Between simulation and dreaming? I'm not being rhetorical. I don't know. And the people who should know don't seem to know either, which is either reassuring or terrifying.
Foundation models for robotics
Same approach that gave us GPT and Claude — massive neural networks trained on vast data, fine-tuned for specific tasks — now being applied to robotics. Depending on your temperament, either thrilling or terrifying.
Google DeepMind's RT-2 showed that a vision-language-action model could control a robot using the same architecture that powers language models. Feed it a camera view and a natural language instruction — "pick up the blue cup and put it on the shelf" — and it generates motor commands. Not through explicit programming. Through learned, generalised capability.
Toyota Research Institute pushed further, developing foundation models for manipulation tasks. Their approach involves training on massive datasets of robotic interaction, then fine-tuning for specific tasks with remarkably little additional data. A robot that already "understands" the general physics of grasping and placing can learn a new task in minutes. Not hours. Minutes. Base knowledge transfers. Skills compound. Experience generalises. Every task the robot learns makes the next one easier, which is exactly how expertise works in humans, except without the complaining.
Figure AI has been integrating large language models directly into their humanoid robots. The robot doesn't execute pre-programmed routines. It reasons about its environment. Makes decisions. Adapts to unexpected situations in real time. If you've ever watched one of their demos and felt a slight chill, congratulations: your survival instincts are working. If you felt excitement, you're paying attention to the right century.
Unitree — the Chinese robotics company that went from robot dogs to full humanoid platforms at prices that make Western competitors visibly nervous — is proving this technology doesn't have to be boutique. It can scale. Democratisation of embodied intelligence, happening in real time, and most people haven't noticed because there wasn't a catchy product launch with a keynote and a waiting list.
What we're looking at is the early stage of general-purpose physical intelligence. Not general in the AGI sense — not yet, probably not soon. General in the sense that matters practically: a single system that can learn many different physical tasks in many different environments. Going from calculators to computers. From "it does one thing" to "tell it what to do."
The imagination question
If a robot generates internal models of the world — simulates scenarios, tests actions virtually, predicts outcomes before committing to movement — is it imagining?
Cautious answer: no. It's running physics simulations. Optimising trajectories. Doing maths. Calling it imagination is anthropomorphism, the same sloppy thinking that leads people to say their Roomba "wants" to clean the kitchen. (My Roomba doesn't want anything. It has the intentionality of a very persistent tennis ball.)
Less cautious answer — more interesting, less defensible — is: what do you think imagination is?
When you imagine picking up a cup, your brain runs a forward model. Simulates the weight, the grip, the trajectory. Predicts what will happen before you move. Tests and discards options below the threshold of consciousness. By the time your hand actually reaches out, your nervous system has rehearsed the movement dozens of times in a simulation you experience as nothing at all.
Different substrate. Carbon versus silicon. Neurons versus parameters. But the computational process — generate, simulate, evaluate, act — is converging between biological and artificial systems in a way that makes clean philosophical distinctions increasingly difficult to maintain. The boundary between "real" imagination and "mere" simulation gets blurrier every year. Eventually we may discover it was never a boundary at all, just a prejudice dressed up as a category.
"Can a robot dream?" is probably the wrong question. Better question: "What is dreaming, exactly, and are we sure we understand it well enough to know who's doing it and who isn't?" Thirty years of neuroscience suggests we are not, in fact, sure. We can't even agree on why humans dream, which makes our confidence about who else might be doing it somewhat premature
What this means for people who aren't philosophers
Manufacturing is the obvious one. Factories have used robots for decades, but they've been special-purpose — welding robots that weld, painting robots that paint, assembly robots that assemble one specific component in one specific way. Brilliant at their one thing. Helpless at anything else. A general-purpose robot instructed in natural language changes the factory floor from a rigid, brittle system into something adaptive. New product line? Don't retool the factory. Retrain the robot. In minutes.
Healthcare is where it gets personal. A robot with general-purpose manipulation skills, guided by a foundation model that understands context and reasons about novel situations, could assist surgeons in ways current surgical robots cannot. Current systems are tele-operated — the surgeon controls every movement. A future system might handle routine aspects of a procedure autonomously while the surgeon focuses on the decisions that need human judgment. Not replacing the surgeon. Augmenting them.
Elder care is the one that gets me. Properly gets me.
I visited a care home recently. A good one. Caring staff, decent facilities. But the staff were overworked. Chronically, systemically overworked. Not because they didn't care, but because there weren't enough of them. The carers cared deeply. The system cared not at all.
One resident had been trying to get out of bed for forty minutes. Not because no one cared. Because the one carer covering that wing was helping someone else, and then someone else, and then someone else. A person who had spent a lifetime never asking anyone for anything, lying there waiting, and the look on their face wasn't frustration. It was resignation. They had accepted that this was what the end looked like. I think about that visit constantly.
A robot that can help an elderly person get out of bed. Prepare a simple meal. Remind someone to take their medication and notice if they seem unwell. Provide a physical presence — not a human presence, I'm not pretending it's the same thing, but a presence — in the long hours between carer visits. This isn't luxury technology. This is dignity technology. The difference between independence and institutionalisation for millions of people.
And here's what I believe that most people in this debate don't want to hear: we're more afraid of robots than we should be. The real danger isn't that robots will replace human carers — it's that we'll refuse to let them help the people who need them most, because the people making decisions about care technology are the people who can afford human carers. There is a particular cruelty in a society that needs more compassion than it's willing to fund. And there is a particular cowardice in blocking a solution because it doesn't look like the solution you'd want for yourself, while offering nothing to the people who are waiting, right now, for forty minutes, just to get out of bed.
The most important applications of embodied AI won't be in factories or warehouses. They'll be in homes. In hospitals. In the unglamorous, underfunded, desperately understaffed spaces where people need help and there aren't enough human hands to provide it. Robots won't replace carers. They'll make caring possible at scale.
The convergence timeline
AI has a pathological relationship with timelines. Everything is always "just around the corner" or "within five years," and then it takes three times longer, or arrives in a form nobody expected. Predictions about the future are most useful as evidence about the present — they reveal what we want, not what we'll get.
With that caveat firmly in place, my honest assessment in August 2025:
General-purpose humanoid robots performing a useful range of household tasks are probably 3-5 years from mainstream deployment. Not 1-2 years, despite what some breathless press releases claim. Hardware is nearly there. AI is nearly there. But integration — making it reliable, safe, affordable, and actually useful in the chaotic, unpredictable environment of a real home, where the cat knocks things off tables and the children leave Lego on the floor like tiny plastic landmines — that's the hard part.
Industrial deployment is closer. Probably 1-2 years for initial rollouts in controlled environments. A robot that drops a package in a warehouse is an inconvenience. A robot that drops a person in a care home is a catastrophe. Margin for error dictates the timeline, and rightly so.
My personal prediction — informed speculation, not prophecy — is that by 2028 you'll be able to buy a general-purpose household robot for roughly the price of a car. By 2030, the price of an appliance. By 2035, not having one will feel as strange as not having a washing machine. I could be wildly wrong. Check back in a decade and we'll see who owes whom a coffee.
The dream at the end
That robot arm in the Toyota facility. Still and silent. Running forward simulations before it moves. Rehearsing in imaginary worlds before acting in the real one.
We built machines that could think. Then machines that could move. Now machines that think about moving and move because they've thought. Somewhere in that loop — between simulation and reality, between the imagined world and the actual one — something new is emerging. Something without a name yet.
I don't know if robots dream. I honestly don't. But I know they simulate, model, predict, rehearse. And when they finally move — when the arm reaches out, when the hand closes around the cup, when the bipedal frame takes its first step on terrain it has never physically encountered but has navigated a million times in silicon — there is a grace to it. A grace that suggests something deeper than computation. Or perhaps suggests that computation was always deeper than we assumed.
When steel learns to dream, it dreams of the world. Not the world as it is, but the world as it could be — a thousand possible futures, simulated and tested and refined, until the best one becomes real. That's not so different from what we do. We dream, plan, imagine, act. Different substrate. Converging process. Whether the dreamer matters more than the dream is something we've been arguing about for three thousand years. I doubt the robots will settle it.
But they might reframe the question. And sometimes that's enough.
We are such stuff as dreams are made on, and our little life is rounded with a sleep.
— William Shakespeare, *The Tempest*
The robots are learning to dream. The dreams are getting better. And the world they're dreaming of — where steel moves with grace and purpose through the spaces where humans live and work and grow old — that world is coming. Not as science fiction.
As an ordinary, unremarkable Tuesday.