Fast LeWorldModel

Gao, Yuntian; Xu, Xiangyu

Fast LeWorldModel

Yuntian Gao, Xiangyu Xu^†

Xi'an Jiaotong University | ^†Corresponding author

Paper PDF Code

Open-Loop Rollout Visualization

Two-Room

start

goal

LeWM

Ready

Fast-LeWM

Ready

Reacher

start

goal

LeWM

Ready

Fast-LeWM

Ready

Planning

Cube

LeWM

Sequential rollout

CEM ready

GT

LeWM

Fast-LeWM

Parallel prefix

CEM ready

GT

Fast-LeWM

PushT

LeWM

Sequential rollout

CEM ready

GT

LeWM

Fast-LeWM

Parallel prefix

CEM ready

GT

Fast-LeWM

Abstract

Joint-Embedding Predictive Architectures (JEPAs), including LeWorldModel, are promising reconstruction-free visual world models. However, LeWM evaluates candidate action sequences through repeated one-step latent transitions, which makes planning slow and allows latent prediction errors to accumulate over long horizons.

Fast-LeWM replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes prefixes of that sequence and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, the model directly learns action effects accumulated over multiple horizons instead of only fitting adjacent one-step transitions. During planning, the terminal prefix token can be used to evaluate the corresponding future latent without explicitly rolling through every intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.

Method: action-prefix prediction

Using prefixes of the candidate action sequence as multi-horizon queries, Fast-LeWM predict all future latents in parallel from the observed anchor latent.

Fast-LeWM training pipeline with visual encoder, causal action-prefix encoder, and parallel latent predictor — Training pipeline. The state token from the current latent conditions a causal action-prefix encoder; each prefix token supervises one future latent horizon.

Results

Fast-LeWM is evaluated on the same goal-conditioned planning tasks and protocol as LeWM: Two-Room, Reacher, PushT, and OGBench-Cube.

3.9x

lower dynamics-module time 31.4s to 8.0s.

48.0%

lower full CEM solve time: 54.4s to 28.3s.

90.5%

average success rate, improved from LeWM's 85.8%.

Planning success (%)

Method	Two-Room	Reacher	PushT	Cube	Avg.
PLDM	97	78	78	65	79.5
DINO-WM	100	79	74	86	84.8
LeWM	87	86	96	74	85.8
Fast-LeWM	98	88	96	80	90.5
Fast-LeWM+ self-consistency	98	90	98	82	92.0

Open-loop latent prediction loss across tasks — Open-loop latent prediction. Fast-LeWM reduces both initial latent error and the growth rate of error over longer horizons.

BibTeX

@misc{gao2026fastleworldmodel,
      title={Fast LeWorldModel}, 
      author={Yuntian Gao and Xiangyu Xu},
      year={2026},
      eprint={2606.26217},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.26217}, 
}