Gradient-based Planning for World Models at Longer Horizons

Overview

Berkeley BAIR researchers introduce GRASP, a gradient-based planner for learned dynamics models known as world models. GRASP aims to make long-horizon planning practical by lifting the trajectory into virtual states for parallel optimization, adding stochasticity to the state iterates for exploration, and reshaping gradients so actions get clean signals. The work addresses why planning with modern world models becomes fragile over long horizons.

Key Takeaways

GRASP is a gradient-based planner for learned dynamics, or world models.
It lifts the trajectory into virtual states so optimization is parallel across time.
It adds stochasticity directly to the state iterates for exploration.
It reshapes gradients so actions get clean signals while avoiding brittle state-input gradients through high-dimensional vision models.
Long-horizon planning with modern world models is fragile because optimization becomes ill-conditioned and creates bad local minima.

Gradient-based Planning for World Models at Longer Horizons

What GRASP changes

GRASP introduces three modifications to gradient-based planning.

›It lifts the trajectory into virtual states so optimization is parallel across time.
›It adds stochasticity directly to the state iterates for exploration.
›It reshapes gradients so actions get clean signals while avoiding brittle state-input gradients through high-dimensional vision models.

The post describes work done with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, where the authors propose GRASP.

Why planning with world models is fragile

Powerful predictive models do not automatically yield good control.

›Optimization becomes ill-conditioned.
›Non-greedy structure creates bad local minima.
›High-dimensional latent spaces introduce subtle failure modes.

The authors note that having a powerful predictive model is not the same as being able to use it effectively for control, learning, or planning, and that long horizons are the real stress test.

What a world model is

The post gives a working definition.

›A world model is a learned model that, given the current state and a sequence of future actions, predicts what will happen next.
›The term is overloaded and can mean an explicit dynamics model or an implicit internal state a generative model relies on.
›When the model is deterministic it reduces to a map over states.

Taking actions and observing states such as images, latent vectors, or proprioception, a world model defines a predictive distribution over the next state given recent states and the current action. In practice the state is often a learned latent representation encoded from pixels, so the model operates in a compact, differentiable space, giving a differentiable simulator that can be rolled forward and backpropagated through.

Planning by optimizing through the model

The simplest planner optimizes an action sequence.

›Given a start state and a goal, the planner chooses an action sequence by rolling out the model.
›It minimizes the terminal error between the final predicted state and the goal.
›As world models scale, they start to look less like task-specific predictors and more like general-purpose simulators.

Frequently Asked Questions

What is GRASP?

GRASP is a gradient-based planner for learned dynamics, or world models, designed to make long-horizon planning practical.

What three changes does GRASP make?

It lifts the trajectory into virtual states for parallel optimization across time, adds stochasticity to the state iterates for exploration, and reshapes gradients so actions get clean signals while avoiding brittle state-input gradients through vision models.

Why is long-horizon planning with world models fragile?

Optimization becomes ill-conditioned, non-greedy structure creates bad local minima, and high-dimensional latent spaces introduce subtle failure modes.

What is a world model in this work?

It is a learned model that, given the current state and a sequence of future actions, predicts what will happen next, often operating in a compact, differentiable latent space.

Who collaborated on GRASP?

The post describes work done with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar.

GRASP targets the fragility of long-horizon planning by restructuring how gradients flow through learned world models.