Sep 13, 2024 8:00 AM

The Godmother of AI Wants Everyone to Be a World Builder

Stanford computer scientist Fei-Fei Li is unveiling a startup that aims to teach AI systems deep knowledge of physical reality. Investors are throwing money at it.

A photo illustration of FeiFei Li codirector of the HumanCentered AI Institute at Stanford University.

If you buy something using links in our stories, we may earn a commission. This helps support our journalism. Learn more. Please also consider subscribing to WIRED

According to market-fixated tech pundits and professional skeptics, the artificial intelligence bubble has popped, and winter’s back. Fei-Fei Li isn’t buying that. In fact, Li—who earned the sobriquet the “godmother of AI”—is betting on the contrary. She’s on a part-time leave from Stanford University to cofound a company called World Labs. While current generative AI is language-based, she sees a frontier where systems construct complete worlds with the physics, logic, and rich detail of our physical reality. It’s an ambitious goal, and despite the dreary nabobs who say progress in AI has hit a grim plateau, World Labs is on the funding fast track. The startup is perhaps a year away from having a product—and it’s not clear at all how well it will work when and if it does arrive—but investors have pitched in $230 million and are reportedly valuing the nascent startup at a billion dollars.

Roughly a decade ago, Li helped AI turn a corner by creating ImageNet, a bespoke database of digital images that allowed neural nets to get significantly smarter. She feels that today’s deep-learning models need a similar boost if AI is to create actual worlds, whether they’re realistic simulations or totally imagined universes. Future George R.R. Martins might compose their dreamed-up worlds as prompts instead of prose, which you might then render and wander around in. “The physical world for computers is seen through cameras, and the computer brain behind the cameras,” Li says. “Turning that vision into reasoning, generation, and eventual interaction involves understanding the physical structure, the physical dynamics of the physical world. And that technology is called spatial intelligence.” World Labs calls itself a spatial intelligence company, and its fate will help determine whether that term becomes a revolution or a punch line.

Li has been obsessing over spatial intelligence for years. While everyone was going gaga over ChatGPT, she and a former student, Justin Johnson, were excitedly gabbling in phone calls about AI’s next iteration. “The next decade will be about generating new content that takes computer vision, deep learning, and AI out of the internet world, and gets them embedded in space and time,” says Johnson, who is now an assistant professor at the University of Michigan.

Li decided to start a company early in 2023, after a dinner with Martin Casado, a pioneer in virtual networking who is now a partner at Andreessen Horowitz. That’s the VC firm notorious for its near-messianic embrace of AI. Casado sees AI as being on a similar path as computer games, which started with text, moved to 2D graphics, and now have dazzling 3D imagery. Spatial intelligence will drive the change. Eventually, he says, “You could take your favorite book, throw it into a model, and then you literally step into it and watch it play out in real time, in an immersive way,” he says. The first step to making that happen, Casado and Li agreed, is moving from large language models to large world models.

Li began assembling a team, with Johnson as a cofounder. Casado suggested two more people—one was Christoph Lassner, who had worked at Amazon, Meta’s Reality Labs, and Epic Games. He is the inventor of Pulsar, a rendering scheme that led to a celebrated technique called 3D Gaussian Splatting. That sounds like an indie band at an MIT toga party, but it’s actually a way to synthesize scenes, as opposed to one-off objects. Casado’s other suggestion was Ben Mildenhall, who had created a powerful technique called NeRF—neural radiance fields—that transmogrifies 2D pixel images into 3D graphics. “We took real-world objects into VR and made them look perfectly real,” he says. He left his post as a senior research scientist at Google to join Li’s team.

One obvious goal of a large world model would be imbuing, well, world-sense into robots. That indeed is in World Labs’ plan, but not for a while. The first phase is building a model with a deep understanding of three dimensionality, physicality, and notions of space and time. Next will come a phase where the models support augmented reality. After that the company can take on robotics. If this vision is fulfilled, large world models will improve autonomous cars, automated factories, and maybe even humanoid robots.

That’s a long way away, and no slam dunk. World Labs promises a product in 2025. When I pressed the founders on exactly what the product would be and who the projected customers were—stuff like how World Labs will make money— they emphasized that they’re just ramping up. “There are a lot of boundaries to push, a lot of unknowns,” says Li. “Of course, we're the best team in the world to figure out these unknowns.”

Casado is a little more specific. As with ChatGPT or Anthropic’s Claude, he notes, a model can be the product—a platform that others either use directly or that hosts other apps. Customers might include game companies or movie studios. I remember writing about how Pixar used to spend endless resources on things like monster fur or the movement of water. Imagine doing that with a one-sentence prompt.

World Labs is not the only company tackling what some are calling physical AI. “Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” Nvidia CEO Jensen Huang said earlier this year. I wrote recently about a company called Archetype that was also pursuing that line. But Casado insists that the ambition, talent, and vision of World Labs is unique. “I’ve been investing for almost 10 years, and this is the single best team I’ve ever, ever run across,” he says. It’s common for a VC to boost his bets, but he’s putting more than money into this one: For the first time since he became a VC, he’s a part-time team member, spending a day a week at the company.

Other VC firms are also chipping in, including Radical Ventures, NEA, and (surprise) Nvidia’s venture capital arm, as well as an all-star list of angels that features Marc Benioff, Reid Hoffman, Jeff Dean, Eric Schmidt, Ron Conway, and Geoff Hinton. (So you’ve got the godfather of AI backing the field’s godmother.) The late Susan Wojcicki also invested before her untimely passing last month.

Can all those smart people be wrong? Of course. You don’t have to squint too hard to see how the promises of World Lab overlap with a recent buzzword that debuzzed rather dramatically: the metaverse. The World Lab founders argue that the short-lived craze was premature, a blip based on some promising hardware that didn’t have the right interactive content. Large world models, they imply, could solve that problem. Presumably, none of those worlds would visualize AI as stuck on a plateau.

Time Travel

Last year, Fei-Fei Li came out with a combination memoir and AI love story, The Worlds I See. At the time I praised the book and discussed it with her in a Plaintext headlined, “Fei-Fei Li Started an AI Revolution by Seeing Like an Algorithm.“ Now she hopes to build worlds that no one has seen before.

Li is a private person who is uncomfortable talking about herself. But she gamely figured out how to integrate her experience as an immigrant who came to the United States when she was 16, with no command of the language, and overcame obstacles to become a key figure in this pivotal technology. On the way to her current position, she’s also been director of the Stanford AI Lab and chief scientist of AI and machine learning at Google Cloud. Li says that her book is structured like a double helix, with her personal quest and the trajectory of AI intertwined into a spiraling whole. “We continue to see ourselves through the reflection of who we are,” says Li. “Part of the reflection is technology itself. The hardest world to see is ourselves.”

The strands come together most dramatically in her narrative of ImageNet’s creation and implementation. Li recounts her determination to defy those, including her colleagues, who doubted it was possible to label and categorize millions of images, with at least 1,000 examples for every one of a sprawling list of categories, from throw pillows to violins. The effort required not only technical fortitude but the sweat of literally thousands of people (spoiler: Amazon’s Mechanical Turk helped turn the trick). The project is comprehensible only when we understand her personal journey. The fearlessness in taking on such a risky project came from the support of her parents, who despite financial struggles insisted she turn down a lucrative job in the business world to pursue her dream of becoming a scientist. Executing this moonshot would be the ultimate validation of their sacrifice.

Ask Me One Thing

Tom asks, “When the smartphone was new, people used to talk about public etiquette regarding their use—now, it's common to see a public space full of people staring at their phones. What do you imagine the etiquette of AR headgear will be?”

Hi, Tom, thanks for the question. Etiquette for AR won’t be as straightforward as it is with phones, where it’s all too apparent when our attention is focused on palm slabs. The apex of augmented reality will come when companies figure out how to build it into lightweight eyewear—kind of like Meta’s hit Ray-Ban glasses, which don’t do AR yet but will at some point. A lot of what we see now on our phones will be readable in head-up displays.

At that point, it won’t be so obvious that behind our sunglasses we are more involved with TikTok, texts, and Candy Crush than our dinner companions. Public places may not look like everyone is really somewhere else, but they will be. I predict that haptics will be essential to alert people when their trains are leaving, or they are blocking a doorway, or they have been robbed. And a typical dinner conversation may go like this: “Have you heard what I just said?” [Silence.] “ARE YOU HEARING WHAT I JUST SAID? [Pause, touches side panel of glasses.] “Yes of course I’m paying attention.” This will be happening at every table in the restaurant!

My etiquette prediction? People will wind up communicating by text even when they are standing next to each other, because whatever they say will be more compelling if it’s beamed to your eyeball and earpiece. So stop complaining about people staring at phones, because worse days are to come.

You can submit questions to mail@wired.com. Write ASK LEVY in the subject line.

End Times Chronicle

How can it get any hotter? Just wait.

Last but Not Least

Here’s everything announced at Apple’s September event.

While the iPhone 16 got attention, AirPods that act like hearing aids might have been Apple’s most significant move

Residents of a Texas oil town aren’t so neighborly when a bitcoin mine moves in.

According to Mark Cuban, Mark Cuban is not having a midlife crisis.

Don't miss future subscriber-only editions of this column. Subscribe to WIRED (50% off for Plaintext readers) today.

Time Travel

Ask Me One Thing

End Times Chronicle

Last but Not Least

You Might Also Like …