Fast LeWorldModel

Yuntian Gao, Xiangyu Xu

Xi'an Jiaotong University  |  Corresponding author

Open-Loop Rollout Visualization

Two-Room

start

Two-Room open-loop rollout start frame

goal

Two-Room open-loop rollout goal frame

LeWM

Ready

Fast-LeWM

Ready

Reacher

start

Reacher open-loop rollout start frame

goal

Reacher open-loop rollout goal frame

LeWM

Ready

Fast-LeWM

Ready

Planning

Cube

LeWM

Sequential rollout

CEM ready
Rollout in Planning...
GT
LeWM
Fast-LeWM

Parallel prefix

CEM ready
prefix pred Rollout in Planning...
GT
Fast-LeWM

PushT

LeWM

Sequential rollout

CEM ready
Rollout in Planning...
GT
LeWM
Fast-LeWM

Parallel prefix

CEM ready
prefix pred Rollout in Planning...
GT
Fast-LeWM

Abstract

Joint-Embedding Predictive Architectures (JEPAs), including LeWorldModel, are promising reconstruction-free visual world models. However, LeWM evaluates candidate action sequences through repeated one-step latent transitions, which makes planning slow and allows latent prediction errors to accumulate over long horizons.

Fast-LeWM replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes prefixes of that sequence and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, the model directly learns action effects accumulated over multiple horizons instead of only fitting adjacent one-step transitions. During planning, the terminal prefix token can be used to evaluate the corresponding future latent without explicitly rolling through every intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.

Method: action-prefix prediction

Using prefixes of the candidate action sequence as multi-horizon queries, Fast-LeWM predict all future latents in parallel from the observed anchor latent.

Fast-LeWM training pipeline with visual encoder, causal action-prefix encoder, and parallel latent predictor
Training pipeline. The state token from the current latent conditions a causal action-prefix encoder; each prefix token supervises one future latent horizon.

Results

Fast-LeWM is evaluated on the same goal-conditioned planning tasks and protocol as LeWM: Two-Room, Reacher, PushT, and OGBench-Cube.

3.9x

lower dynamics-module time 31.4s to 8.0s.

48.0%

lower full CEM solve time: 54.4s to 28.3s.

90.5%

average success rate, improved from LeWM's 85.8%.

Planning success (%)

MethodTwo-RoomReacherPushTCubeAvg.
PLDM9778786579.5
DINO-WM10079748684.8
LeWM8786967485.8
Fast-LeWM9888968090.5
Fast-LeWM+ self-consistency9890988292.0
Open-loop latent prediction loss across tasks
Open-loop latent prediction. Fast-LeWM reduces both initial latent error and the growth rate of error over longer horizons.

BibTeX

@misc{gao2026fastleworldmodel,
      title={Fast LeWorldModel}, 
      author={Yuntian Gao and Xiangyu Xu},
      year={2026},
      eprint={2606.26217},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.26217}, 
}