Google DeepMind’s New AI Agent Learns, Adapts and Plays Games Like a Human

Google DeepMind introduced SIMA 2—a reasoning AI agent built for 3D worlds that the company says is a step closer to AGI.

By Jason Nelson

3 min read

Google DeepMind introduced SIMA 2 on Thursday—a new AI agent that the company claims behaves like a “companion” inside virtual worlds. With the launch of SIMA 2, DeepMind aims to advance beyond simple on-screen actions and move toward AI that can plan, explain itself, and learn through experience.

“This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general,” the company said on its website.

The first version of SIMA (Scalable Instructable Multiworld Agent), released in March 2024, learned hundreds of basic skills by watching the screen and using virtual keyboard and mouse controls. The new version of SIMA, Google said, takes things a step further by letting the AI think for itself.

“SIMA 2 is our most capable AI agent for virtual 3D worlds,” Google DeepMind wrote on X. “Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments–meaning you can talk to it through text, voice, or even images.”

By using the Gemini AI model, Google said SIMA can interpret high-level goals, talk through the steps it intends to take, and collaborate inside games with a level of reasoning the original system could not reach.

DeepMind reported stronger generalization across virtual environments, and that SIMA 2 completed longer, more complex tasks, which included logic prompts, sketches drawn on the screen, and emojis.

“As a result of this ability, SIMA 2’s performance is significantly closer to that of a human player on a wide range of tasks,” Google wrote, noting that SIMA 2 had a 65% task completion rate, compared to 31% by SIMA 1.

The system also interpreted instructions and acted inside entirely new 3D worlds generated by Genie 3, another DeepMind project released last year that creates interactive environments from a single image or text prompt. SIMA 2 oriented itself, understood goals, and took meaningful actions in worlds it had never encountered until moments before testing.

“SIMA 2 is now far better at carrying out detailed instructions, even in worlds it's never seen before,” Google wrote. “It can transfer learned concepts like ‘mining' in one game and apply it to 'harvesting' in another—connecting the dots between similar tasks.”

After learning from human demonstrations, researchers said the agent switched into self-directed play, using trial and error and Gemini-generated feedback to create new experience data, including a training loop where SIMA 2 generated tasks, attempted them, and then fed its own trajectory data back into the next version of the model.

While Google hailed SIMA 2 as a step forward for artificial intelligence, the research also identified gaps that still need to be addressed, including struggling with very long, multi-step tasks, working within a limited memory window, and facing visual-interpretation challenges common to 3D AI systems.

Even so, DeepMind said the platform served as a testbed for skills that could eventually migrate into robotics and navigation.

“Our SIMA 2 research offers a strong path towards applications in robotics and another step towards AGI in the real world,” it said.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Recommended News