It was early January 2016, and I had just joined Google X, Alphabet’s secret innovation lab. My job: help figure out what to do with the employees and technology left over from nine robot companies that Google had acquired. People were confused. Andy “the father of Android” Rubin, who had previously been in charge, had suddenly left. Larry Page and Sergey Brin kept trying to offer guidance and direction during occasional flybys in their “spare time.” Astro Teller, the head of Google X, had agreed a few months earlier to bring all the robot people into the lab, affectionately referred to as the moonshot factory.
I signed up because Astro had convinced me that Google X—or simply X, as we would come to call it—would be different from other corporate innovation labs. The founders were committed to thinking exceptionally big, and they had the so-called “patient capital” to make things happen. After a career of starting and selling several tech companies, this felt right to me. X seemed like the kind of thing that Google ought to be doing. I knew from firsthand experience how hard it was to build a company that, in Steve Jobs’ famous words, could put a dent in the universe, and I believed that Google was the right place to make certain big bets. AI-powered robots, the ones that will live and work alongside us one day, was one such audacious bet.
Eight and a half years later—and 18 months after Google decided to discontinue its largest bet in robotics and AI—it seems as if a new robotics startup pops up every week. I am more convinced than ever that the robots need to come. Yet I have concerns that Silicon Valley, with its focus on “minimum viable products” and VCs’ general aversion to investing in hardware, will be patient enough to win the global race to give AI a robot body. And much of the money that is being invested is focusing on the wrong things. Here is why.
The Meaning of “Moonshot”
Google X—the home of Everyday Robots, as our moonshot came to be known—was born in 2010 from a grand idea that Google could tackle some of the world’s hardest problems. X was deliberately located in its own building a few miles away from the main campus, to foster its own culture and allow people to think far outside the proverbial box. Much effort was put into encouraging X-ers to take big risks, to rapidly experiment, and even to celebrate failure as an indication that we had set the bar exceptionally high. When I arrived, the lab had already hatched Waymo, Google Glass, and other science-fiction-sounding projects like flying energy windmills and stratospheric balloons that would provide internet access to the underserved.
What set X projects apart from Silicon Valley startups is how big and long-term X-ers were encouraged to think. In fact, to be anointed a moonshot, X had a “formula”: The project needed to demonstrate, first, that it was addressing a problem that affects hundreds of millions, or even billions, of people. Second, there had to be a breakthrough technology that gave us line of sight to a new way to solve the problem. Finally, there needed to be a radical business or product solution that probably sounded like it was just on the right side of crazy.
The AI Body Problem
It’s hard to imagine a person better suited to running X than Astro Teller, whose chosen title was literally Captain of Moonshots. You would never see Astro in the Google X building, a giant, three-story converted department store, without his signature rollerblades. Top that with a ponytail, always a friendly smile, and, of course, the name Astro, and you might think you’d entered an episode of HBO’s Silicon Valley.
When Astro and I first sat down to discuss what we might do with the robot companies that Google had acquired, we agreed something should be done. But what? Most useful robots to date were large, dumb, and dangerous, confined to factories and warehouses where they often needed to be heavily supervised or put in cages to protect people from them. How were we going to build robots that would be helpful and safe in everyday settings? It would require a new approach. The huge problem we were addressing was a massively global human shift—aging populations, shrinking workforces, labor shortages. Our breakthrough technology was—we knew, even in 2016—going to be artificial intelligence. The radical solution: fully autonomous robots that would help us with an ever-growing list of tasks in our everyday lives.
We were, in other words, going to give AI a body in the physical world, and if there was one place where something of this scale could be concocted, I was convinced it would be X. It was going to take a long time, a lot of patience, a willingness to try crazy ideas and fail at many of them. It would require significant technical breakthroughs in AI and robot technology and very likely cost billions of dollars. (Yes, billions.) There was a deep conviction on the team that, if you looked just a bit beyond the horizon, a convergence of AI and robotics was inevitable. We felt that much of what had only existed in science fiction to date was about to become reality.
It’s Your Mother
Every week or so, I’d talk to my mother on the phone. Her opening question was always the same: “When are the robots coming?” She wouldn’t even say hello. She just wanted to know when one of our robots would come help her. I would respond, “It’ll be a while, Mom,” whereupon she’d say, “They better come soon!”
Living in Oslo, Norway, my mom had good public health care; caregivers showed up at her apartment three times daily to help with a range of tasks and chores, mostly related to her advanced Parkinson’s disease. While these caregivers enabled her to live alone in her own home, my mother hoped that robots could support her with the myriad of small things that had now become insurmountable and embarrassing barriers, or sometimes simply offer her an arm to lean against.
It’s Really Hard
“You do know that robotics is a systems problem, right?” Jeff asked me with a probing look. Every team seems to have a “Jeff”; Jeff Bingham was ours. He was a skinny, earnest guy with a PhD in bioengineering who grew up on a farm and had a reputation for being a knowledge hub with deep insights about … kinda everything. To this day, if you ask me about robots, one of the first things I’ll tell you is that, well, it’s a systems problem.
One of the important things Jeff was trying to reinforce was that a robot is a very complex system and only as good as its weakest link. If the vision subsystem has a hard time perceiving what’s in front of it in direct sunlight, then the robots may suddenly go blind and stop working if a ray of sun comes through a window. If the navigation subsystem doesn’t understand stairs, then the robot may tumble down them and hurt itself (and possibly innocent bystanders). And so on. Building a robot that can live and work alongside us is hard. Like, really hard.
For decades people have been trying to program various forms of robots to perform even simple tasks, like grasping a cup on a table or opening a door, and these programs have always ended up becoming extremely brittle, failing at the slightest change in conditions or variations in the environment. Why? Because of the lack of predictability in the real world (like that ray of sunlight). And we haven’t even gotten to the hard stuff yet, like moving through the messy and cluttered spaces where we live and work.
Once you start thinking carefully about all this, you realize that unless you lock everything down, really tight, with all objects being in fixed, predefined locations, and the lighting being just right and never changing, simply picking up, say, a green apple and placing it in a glass bowl on a kitchen table becomes an all but impossibly difficult problem to solve. This is why factory robots are in cages. Everything from the lighting to the placement of the things they work on can be predictable, and they don’t have to worry about bonking a person on the head.
How to Learn Learning Robots
But all you need, apparently, is 17 machine-learning people. Or so Larry Page told me—one of his classic, difficult-to-comprehend insights. I tried arguing that there was no way we could possibly build the hardware and software infrastructure for robots that would work alongside us with only a handful of ML researchers. He waved his hand at me dismissively. “All you need is 17.” I was confused. Why not 11? Or 23? I was missing something.
Boiling it down, there are two primary approaches to applying AI in robotics. The first is a hybrid approach. Different parts of the system are powered by AI and then stitched together with traditional programming. With this approach the vision subsystem may use AI to recognize and categorize the world it sees. Once it creates a list of the objects it sees, the robot program receives this list and acts on it using heuristics implemented in code. If the program is written to pick that apple off a table, the apple will be detected by the AI-powered vision system, and the program would then pick out a certain object of “type: apple” from the list and then reach to pick it up using traditional robot control software.
The other approach, end-to-end learning, or e2e, attempts to learn entire tasks like “picking up an object,” or even more comprehensive efforts like “tidying up a table.” The learning happens by exposing the robots to large amounts of training data—in much the way a human might learn to perform a physical task. If you ask a young child to pick up a cup, they may, depending on how young they are, still need to learn what a cup is, that a cup might contain liquid, and then, when playing with the cup, repeatedly knock it over, or at least spill a lot of milk. But with demonstrations, imitating others, and lots of playful practice, they’ll learn to do it—and eventually not even have to think about the steps.
What I came to believe Larry was saying was that nothing really mattered unless we ultimately demonstrated that robots could learn to perform end-to-end tasks. Only then would we have a real shot at making robots reliably perform these tasks in the messy and unpredictable real world, qualifying us to be a moonshot. It wasn’t about the specific number 17, but about the fact that big breakthroughs require small teams, not armies of engineers. Obviously there is a lot more to a robot than its AI brain, so I did not discontinue our other engineering efforts—we still had to design and build a physical robot. It became clear, though, that demonstrating a successful e2e task would give us some faith that, in moonshot parlance, we could escape Earth's gravitational pull. In Larry’s world, everything else was essentially “implementation details.”
On the Arm-Farm
Peter Pastor is a German roboticist who received his PhD in robotics from the University of Southern California. On the rare occasion when he wasn’t at work, Peter was trying to keep up with his girlfriend on a kiteboard. In the lab, he spent a lot of his time wrangling 14 proprietary robot arms, later replaced with seven industrial Kuka robot arms in a configuration we dubbed “the arm-farm.”
These arms ran 24/7, repeatedly attempting to pick up objects, like sponges, Lego blocks, rubber ducklings, or plastic bananas, from a bin. At the start they would be programmed to move their claw-like gripper into the bin from a random position above, close the gripper, pull up, and see if they had caught anything. There was a camera above the bin that captured the contents, the movement of the arm, and its success or failure. This went on for months.
In the beginning, the robots had a 7 percent success rate. But each time a robot succeeded, it got positive reinforcement. (Basically meaning, for a robot, that so-called “weights” in the neural network used to determine various outcomes are adjusted to positively reinforce the desired behaviors, and negatively reinforce the undesired ones.) Eventually, these arms learned to successfully pick up objects more than 70 percent of the time. When Peter showed me a video one day of a robot arm not just reaching down to grasp a yellow Lego block but nudging other objects out of the way in order to get a clear shot at it, I knew we had reached a real turning point. The robot hadn’t been explicitly programmed, using traditional heuristics, to make that move. It had learned to do it.
But still—seven robots working for months to learn how to pick up a rubber duckling? That wasn’t going to cut it. Even hundreds of robots practicing for years wouldn’t be enough to teach the robots to perform their first useful real-world tasks. So we built a cloud-based simulator and, in 2021, created more than 240 million robot instances in the sim.
Think of the simulator as a giant video game, with a model of real-world physics that was realistic enough to simulate the weight of an item or the friction of a surface. The many thousands of simulated robots would use their simulated camera input and their simulated bodies, modeled after the real robots, to perform their tasks, like picking up a cup from a table. Running at once, they would try and fail millions of times, collecting data to train the AI algorithms. Once the robots got reasonably good in simulation, the algorithms were transferred to physical robots to do final training in the real world so they could embody their new moves. I always thought of the simulation as robots dreaming all night and then waking up having learned something new.
It’s the Data, Stupid
The day we all woke up and discovered ChatGPT, it seemed like magic. An AI-powered system could suddenly write complete paragraphs, answer complicated questions, and engage in an ongoing dialog. At the same time, we also came to understand its fundamental limitation: It had taken enormous amounts of data to accomplish this.
Robots are already leveraging large language models to understand spoken language and vision models to understand what they see, and this makes for very nice YouTube demo videos. But teaching robots to autonomously live and work alongside us is a comparably huge data problem. In spite of simulations and other ways to create training data, it is highly unlikely that robots will “wake up” highly capable one day, with a foundation model that controls the whole system.
The verdict is still out on how complex the tasks will be that we can teach a robot to perform with AI alone. I have come to believe it will take many, many thousands, maybe even millions of robots doing stuff in the real world to collect enough data to train e2e models that make the robots do anything other than fairly narrow, well-defined tasks. Building robots that perform useful services—like cleaning up and wiping all the tables in a restaurant, or making the beds in a hotel—will require both AI and traditional programming for a long time to come. In other words, don’t expect robots to go running off outside our control, doing something they weren’t programmed to do, anytime soon.
But Should They Look Like Us?
Horses are very efficient at walking and running on four legs. Yet we designed cars to have wheels. Human brains are incredibly efficient biological computers. Yet chip-based computers don’t come close to performing like our brains. Why don’t cars have legs, and why weren’t computers modeled on our biology? The goal of building robots, I mean to say, shouldn’t just be mimicry.
This I learned one day at a meeting with a group of technical leaders at Everyday Robots. We were sitting around a conference table having an animated conversation about whether our robots should have legs or wheels. Such discussions tended to devolve more into religious debates than fact-based or scientific ones. Some people are very attached to the idea that robots should look like people. Their rationale is good. We have designed the places in which we live and work to accommodate us. And we have legs. So maybe robots should too.
After about 30 minutes, the most senior engineering manager in the room, Vincent Dureau, spoke up. He simply said, “I figure that if I can get there, the robots should be able to get there.” Vincent was seated in his wheelchair. The room went quiet. The debate was over.
The fact is, robot legs are mechanically and electronically very complex. They don’t move very fast. They’re prone to making the robot unstable. They’re also not very power-efficient compared to wheels. These days, when I see companies attempting to make humanoid robots—robots that try to closely mimic human form and function—I wonder if it is a failure of imagination. There are so many designs to explore that complement humans. Why torture ourselves reaching for mimicry? At Everyday Robots, we tried to make the morphology of the robot as simple as possible—because the sooner robots can perform real-world tasks, the faster we can gather valuable data. Vincent’s comment reminded us that we needed to focus on the hardest, most impactful problems first.
Desk Duty
I was at my desk when one of our one-armed robots with a head shaped like a rectangle with rounded corners rolled up, addressed me by name, and asked if it could tidy up. I said yes and stepped aside. A few minutes later it had picked up a couple of empty paper cups, a transparent iced tea cup from Starbucks, and a plastic Kind bar wrapper. It dropped these items into a trash tray attached to its base before turning toward me, giving me a nod, and heading over to the next desk.
This tidy-desk service represented an important milestone: It showed that we were making good progress on an unsolved part of the robotics puzzle. The robots were using AI to reliably see both people and objects! Benjie Holson, a software engineer and former puppeteer who led the team that created this service, was an advocate for the hybrid approach. He wasn’t against end-to-end learned tasks but simply had a let’s-try-to-make-them-do-something-useful-now attitude. If the ML researchers solved some e2e task better than his team could program it, they’d just pull the new algorithms into their quiver.
I’d gotten used to our robots rolling around, doing chores like tidying desks. Occasionally I would spot a visitor or an engineer who had just joined the team. They’d have a look of wonder and joy on their face as they watched the robots going about their business. Through their eyes I was reminded just how novel this was. As our head of design, Rhys Newman, would say when a robot rolled by one day (in his Welsh accent), “It’s become normal. That’s weird, isn’t it?”
Just Dance
Our advisers at Everyday Robots included a philosopher, an anthropologist, a former labor leader, a historian, and an economist. We vigorously debated economic, social, and philosophical questions like: If robots lived alongside us, what would the economic impact be? What about the long-term and near-term effects on labor? What does it mean to be human in an age of intelligent machines? How do we build these machines in ways that make us feel welcome and safe?
In 2019, after telling my team that we were looking for an artist in residence to do some creative, weird, and unexpected things with our robots, I met Catie Cuan. Catie was studying for her PhD in robotics and AI at Stanford. What caught my attention was that she had been a professional dancer, performing at places like the Metropolitan Opera Ballet in NYC.
You’ve probably seen YouTube videos of robots dancing—performances where the robot carries out a preprogrammed sequence of timed moves, synchronized to music. While fun to watch, these dances are not much different than what you’d experience on a ride at Disneyland. I asked Catie what it would be like if, instead, robots could improvise and engage with each other like people do. Or like flocks of birds, or schools of fish. To make this happen, she and a few other engineers developed an AI algorithm trained on the preferences of a choreographer. That being, of course, Catie.
Often during evenings and sometimes weekends, when the robots weren’t busy doing their daily chores, Catie and her impromptu team would gather a dozen or so robots in a large atrium in the middle of X. Flocks of robots began moving together, at times haltingly, yet always in interesting patterns, with what often felt like curiosity and sometimes even grace and beauty. Tom Engbersen is a roboticist from the Netherlands who painted replicas of classic masterpieces in his spare time. He began a side project collaborating with Catie on an exploration of how dancing robots might respond to music or even play an instrument. At one point he had a novel idea: What if the robots became instruments themselves? This kicked off an exploration where each joint on the robot played a sound when it moved. When the base moved it played a bass sound; when a gripper opened and closed it made a bell sound. When we turned on music mode, the robots created unique orchestral scores every time they moved. Whether they were traveling down a hallway, sorting trash, cleaning tables, or “dancing” as a flock, the robots moved and sounded like a new type of approachable creature, unlike anything I had ever experienced.
This Is Only the Beginning
In late 2022, the end-to-end versus hybrid conversations were still going strong. Peter and his teammates, with our colleagues in Google Brain, had been working on applying reinforcement learning, imitation learning, and transformers—the architecture behind LLMs—to several robot tasks. They were making good progress on showing that robots could learn tasks in ways that made them general, robust, and resilient. Meanwhile, the applications team led by Benjie was working on taking AI models and using them with traditional programming to prototype and build robot services that could be deployed among people in real-world settings.
Meanwhile, Project Starling, as Catie’s multi-robot installation ended up being called, was changing how I felt about these machines. I noticed how people were drawn to the robots with wonder, joy, and curiosity. It helped me understand that how robots move among us, and what they sound like, will trigger deep human emotion; it will be a big factor in how, even if, we welcome them into our everyday lives.
We were, in other words, on the cusp of truly capitalizing on the biggest bet we had made: robots powered by AI. AI was giving them the ability to understand what they heard (spoken and written language) and translate it into actions, or understand what they saw (camera images) and translate that into scenes and objects that they could act on. And as Peter’s team had demonstrated, robots had learned to pick up objects. After more than seven years we were deploying fleets of robots across multiple Google buildings. A single type of robot was performing a range of services: autonomously wiping tables in cafeterias, inspecting conference rooms, sorting trash, and more.
Which was when, in January 2023, two months after OpenAI introduced ChatGPT, Google shut down Everyday Robots, citing overall cost concerns. The robots and a small number of people eventually landed at Google DeepMind to conduct research. In spite of the high cost and the long timeline, everyone involved was shocked.
A National Imperative
In 1970, for every person over 64 in the world, there were 10 people of working age. By 2050, there will likely be fewer than four. We’re running out of workers. Who will care for the elderly? Who will work in factories, hospitals, restaurants? Who will drive trucks and taxis? Countries like Japan, China, and South Korea understand the immediacy of this problem. There, robots are not optional. Those nations have made it a national imperative to invest in robotics technologies.
Giving AI a body in the real world is both an issue of national security and an enormous economic opportunity. If a technology company like Google decides it cannot invest in “moonshot” efforts like the AI-powered robots that will complement and supplement the workers of the future, then who will? Will the Silicon Valley or other startup ecosystems step up, and if so, will there be access to patient, long-term capital? I have doubts. The reason we called Everyday Robots a moonshot is that building highly complex systems at this scale went way beyond what venture-capital-funded startups have historically had the patience for. While the US is ahead in AI, building the physical manifestation of it—robots—requires skills and infrastructure where other nations, most notably China, are already leading.
The robots did not show up in time to help my mother. She passed away in early 2021. Our frequent conversations toward the end of her life convinced me more than ever that a future version of what we started at Everyday Robots will be coming. In fact, it can’t come soon enough. So the question we are left to ponder becomes: How does this kind of change and future happen? I remain curious, and concerned.
Let us know what you think about this article. Submit a letter to the editor at mail@wired.com.