Fei-Fei Li, a Stanford professor and computer scientist, is widely seen as the “godmother of artificial intelligence”.
One of the most influential voices in AI, Li is best known for developing ImageNet, the database set up almost 20 years ago that has become the foundation of modern computer vision technology. Now, she is on a new mission as a start-up founder.
Her latest venture, World Labs, is building so-called spatial intelligence, which aims to understand our physical world. The company is building “world models”, which, unlike language models that only process language, are able to generate three-dimensional worlds. Her latest world model, Marble, came out in November.
In conversation with the Financial Times’ technology correspondent Cristina Criddle, she explains how these models could supercharge human creativity and help people with various applications from robotics to game design.
Cristina Criddle: I’d love to hear more about Marble.
Fei Fei Li: World Labs is a frontier model company. We are very much focusing on building the cutting edge, pushing the frontier of artificial intelligence. And for me, AI would not be complete unless it has the scope and the depth or the capability of spatial intelligence that humans have.
Marble is the first product that [focuses] on allowing users to create incredible 3D worlds, by either lifting a real world through a photo into Marble, or a small video, or creating an imaginary world through Marble.
So whether it’s real or imaginary, the ability to generate 3D worlds . . . and also serve the workflow of creators, is the goal of today’s Marble as a product. But it’s very important for us to position this as a model-first approach, that we want to get users to use our model through the product.
CC: What are you hoping to learn from user interactions with the model?
FL: I really, really believe that human creativity cannot be replaced. It can be seen as superpowered, and I hope that Marble is a superpowering collaboration between creators, designers, and developers. So the more they use it and give us feedback on how to superpower them, the better it is.
CC: And what are some of the applications you envision it being used for?
FL: So, this is a very horizontal technology, because the need to use world models is really, really wide. For example, a concrete example is the VFX [visual effects] industry, to just preview ideas, to film virtual production movies, to ideate movie sets.
There’s a lot of VFX used in Marble. It can give you a 3D space, it can give you videos with precise colour control within that space, and it can be exported in different formats so that you can use it in your workflow.
We also see designers, interior designers, or architects, using Marble to not only visualise their ideas of interior or building design, but they are able to tour around this, because it’s almost like taking a tour in the mind’s eye. You can be immersed in that.
We also have seen robotic simulation developers using Marble as the foundation to simulate training environments for robotics, whether it’s navigation or manipulation.
We also have seen game developers using Marble, especially we provide mesh colliders and mesh structure so that game developers can very easily download the worlds and the meshes and build gameplay.
We have also seen the researchers wanting to use Marble as an immersive environment to study human psychology and clinical conditions.
So every use case we have seen so far, there’s one unifying theme, which is it saves people time. Because ideation, abrasion and usage is a loop of work. And Marble helps the tech artists, developers, researchers, save time.
CC: And what will people do with more time? I think this is the big question. Is this an excuse to lay off people, or is there something that’s going to be created out of giving more time to people?
FL: It depends on if you believe creativity is bounded or unbounded. I believe human creativity is unbounded.
For example, when we worked with our Sony Studio partner on creating our video for Marble, the virtual production tech artists and developers told us, because they can use Marble to actually create the movie for us, they’re able to play with far more ideas than they’d have the time for if they were doing it in a traditional way. Because the traditional way, they have to wait very, very long to get one idea visualised. Marble takes minutes.
In fact, someone we talked to in the VFX industry said it sped up the ideation and development by 40 times. And this is our time. It’s the time you can use to ideate a lot more. Because as a creator, you don’t just have one idea. Sometimes, you have 10 ideas. Sometimes, the 10 ideas inspire 10 more ideas. Sometimes, you want to just do some tweaking of your ideas. And all these are expensive processes.
CC: And in the gaming application, what do you think it means for how games will be developed? There’s also discussion about what the role of gaming engines are now, because you can just create these 3D environments.
FL: Yes, I think this is all up for disruption. I think neuro-inspired engines and simulation gaming engines are due for improvements. And that is to make developers and creators’ lives better. I think Marble, and also our earlier work, RTFM [a world model released in October], are all part of this latest GenAI effort. We’re at the forefront of it.
And I also think games are not just purely games. We’re seeing a blend of games with experiences, education, and productivity. And this new trend, what we see in creativity and content is called transmedia. And transmedia also can leverage all these generative AI tools like Marble.
CC: World models have come up so much in my conversations with Google, Runway, and then I recently had a story on xAI building world models. And so, what makes World Labs distinct? And what are you doing that’s going to set you apart from your competitors?
FL: First of all, we are the first company that came out two years ago that is devoted to spatial intelligence. And I see language models, what we call large language models, as a way to achieve linguistic intelligence. World models are a way to help us achieve spatial intelligence.
Spatial knowledge is the cognitive ability to allow embodied agents, whether it’s humans or avatars or robots, to understand, reason, create, and interact in worlds. And that’s a very deeply-rooted cognitive and physical ability of animals, humans, and the future of robots. And that’s the North Star for us.
CC: Do you believe that there’s still a place for LLMs once you’ve reached that North Star?
FL: Oh, yes. Just like humans, we use multi-intelligence. In fact, Howard Gardner in [1980s] already proposed this theory of multiple intelligence. Humans are really the epitome example of multiple intelligence, whether it’s language, spatial, mathematical, emotional. It’s very much an orchestra of intelligence.
CC: You mentioned using it for robotic simulation, how big is the jump from exploring these virtual worlds on computers to having embodied AI that has this spatial intelligence in it?
FL: So, first of all, I want to make sure that you get that people can upload real world images or videos and turn that into the virtual digital worlds for robots. Training robots, even in deployment in the virtual world, is part of the very important multi-dimensional and multi-ingredient approach in solving robotics.
If you look at today’s robotics learning efforts, the combination of real-world tele-operated data [where humans demonstrate actions remotely], web data, and synthetic simulated data [which are generated with a computer] are really the core suit of data for robotics. And we’re so far from solving robotics that we really need to be sober about solving the data problem.
And self-driving cars had to go through this for decades, solving the data problem. And that’s an even easier problem than robotics, because self-driving cars are a simpler form of movement compared to robots that manipulate everything and anything in the world.
Having the simulation environment, simulation data, synthetic data is a key to the success. And this is why spatial intelligence, in the form of the world models we’re doing, is so important. And then, it also will set us up for the future of moving into being part of this robotics ecosystem, in addition to being a creativity ecosystem.
CC: And do you think it represents a step towards superintelligence? And is that something that World Labs is aiming for?
FL: First of all, our North Star is an intelligence that’s benevolent to people. I don’t really care what you call it. AI, as any tool, is in service of people and humanity. And that is what every researcher and technologist business should focus on.
Read the full article here


