Physics in 2-Steps:
Locking Motion Priors Before Visual Refinement Erases Them

1Yonsei University 2NVIDIA
TL;DR. A 2-step generation often has better physics than the full 50-step output. We trace this to phase erosion during denoising, and introduce PhaseLock — a training-free framework that locks the early motion prior into the final high-fidelity output via Latent Delta Guidance. +6.2 pts physical consistency, 1.06× time, 1.02× memory.

How PhaseLock Works

A single walkthrough of the two-stage pipeline. We run 2 denoising steps to extract a motion prior Δphys, then re-inject it as Latent Δ Guidance during the full 50-step run.

PROMPT Two pillows on a table and two grabber tools hanging above them from which a brown tennis ball and an orange block are suspended. The grabber tools let go of the ball and block.
Input
image + text
first frame of target scene
Baseline · 50 steps waiting
running 50 steps…
T=50
clean texture —
but physics drifts
same model,
fewer steps
Same model · 2 steps waiting
Encoding…
T=2
coarse —
but physics is valid!
Δphys extracted
inject Δphys
as guidance
PhaseLock · 50 + Δphys waiting
50 + guidance…
+ PhaseLock · T=50
full fidelity
and correct physics
— compare — same model · same input · different physics

The counter-intuitive finding: running the same model for only 2 steps (③) already produces physically valid motion — though coarse. Running it for 50 steps (②) refines texture but hallucinates the dynamics. PhaseLock extracts Δphys from ③ and injects it back into the 50-step run (④), recovering both fidelity and physics.

Base (Wan 2.1, 50 steps)
+ PhaseLock (ours)

Dropping a potato into water. The baseline produces static, implausible motion; PhaseLock preserves the physics of displacement and splash.

The Counter-Intuitive Finding

Running fewer denoising steps should hurt quality — yet we find the opposite for physics.

Overview of PhaseLock
Figure 1. Overview of PhaseLock. Few-step inference (T=2) captures accurate physical motion but lacks texture; standard inference (T=50) achieves photorealism but hallucinates the motion. PhaseLock extracts the motion prior from the 2-step trajectory and injects it during the 50-step denoising — recovering both fidelity and physics.
~18%
phase degradation
from step 2 → step 50
2
inference steps
enough for valid physics
+6.2
avg. points improvement
in physical consistency
1.06×
wall-clock overhead
(no training)

Abstract

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising: the phase degrades significantly (dropping by ≈18% from step 2 to step 50), whereas the magnitude remains relatively stable.

Building on this insight, we propose PhaseLock, a training-free framework that locks the valid motion priors into the denoising trajectory found in few-step inference. Rather than requiring 50 steps to establish physics, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Extensive experiments show an average improvement of 6.2 points across diverse models with negligible overhead (1.06× time, 1.02× memory), eliminating the need for expensive external guidance methods (~5× time).

Why Does This Happen? Phase Erosion

Motion is encoded in the phase spectrum; texture lives in the magnitude. The denoising chain refines magnitude at the cost of phase — quietly erasing the physics.

Phase property analysis
Figure 3. Further analysis on phase properties. (a) Blur control: even with Gaussian blur matched to the 2-step sharpness, the 2-step output retains significantly higher Phase Temporal Correlation — phase loss is structural, not a frequency artifact. (b) Phase Sensitivity: physical dynamics degrade rapidly under phase corruption, while magnitude corruption has little effect.

Method: PhaseLock

A drop-in, training-free procedure in two stages.

PhaseLock pipeline
Figure 4. The overall pipeline of PhaseLock. (1) Motion Prior Extraction. A short 2-step trajectory yields coarse but physically valid motion dynamics. (2) Latent Delta Guidance. The per-frame motion delta is transferred into the standard denoising process, biasing the 50-step path toward the extracted prior — without any fine-tuning.
1

Extract

Run 2 denoising steps to obtain a coarse latent sequence z2. Its frame-to-frame deltas encode the motion prior.

2

Lock

Compute motion prior Δphys = z2(t+1) − z2(t) — a target dynamics signal grounded in physically-valid evolution.

3

Guide

During the 50-step run, add Latent Delta Guidance with strength λ = 0.05 (linear decay) to align the high-fidelity trajectory to Δphys.

Interactive walkthrough

Drag the slider (or press play) to trace the denoising process. Watch Phase erode on the baseline path while PhaseLock keeps it locked after step 2.

Input
Input image FEW INFERENCE · MOTION PRIOR t=1 t=2 Δphys motion prior STANDARD 50-STEP · HIGH-FIDELITY REFINEMENT t = 0 / 50 Latent Δ Guidance waiting…
Phase integrity · Baseline 100%
Phase integrity · PhaseLock 100%

Results

Qualitative — Physics-IQ

Qualitative results on Physics-IQ
Figure 5. Across diverse scenarios — fluid displacement, rigid-body drop, capillary flow — PhaseLock restores physically consistent motion that the baseline fails to produce.

Efficiency

Efficiency vs performance
Figure 7. Comparable latency and memory to the baseline — gains without a cost. Other inference-time guidance methods need >5× the compute for similar gains.

Ablation

Hyperparameter ablation
Figure 6. Peaks at motion strength λ0=0.05 and NFE=2; performance monotonically degrades as the prior trajectory lengthens, consistent with the phase-erosion theory.
74%
of Physics-IQ scenarios improved
on Wan 2.1 (49 / 66)
67%
improved on CogVideoX-5b
(44 / 66)
93%
of fluid-dynamics scenarios
improved (Wan 2.1)
88%
of optics scenarios
improved (CogVideoX)

Citation

@misc{han2026phaselock,
  title  = {Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them},
  author = {Han, Woojung and Kang, Seil and Jun, Youngjun and
            Chen, Min-Hung and Yang, Fu-En and Hwang, Seong Jae},
  year   = {2026}
}