A single walkthrough of the two-stage pipeline. We run 2 denoising steps to extract a motion prior Δphys, then re-inject it as Latent Δ Guidance during the full 50-step run.
T=50T=2T=50The counter-intuitive finding: running the same model for only 2 steps (③) already produces physically valid motion — though coarse. Running it for 50 steps (②) refines texture but hallucinates the dynamics. PhaseLock extracts Δphys from ③ and injects it back into the 50-step run (④), recovering both fidelity and physics.
Running fewer denoising steps should hurt quality — yet we find the opposite for physics.
T=2) captures accurate
physical motion but lacks texture; standard inference (T=50) achieves photorealism but hallucinates
the motion. PhaseLock extracts the motion prior from the 2-step trajectory and injects it during the
50-step denoising — recovering both fidelity and physics.
Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising: the phase degrades significantly (dropping by ≈18% from step 2 to step 50), whereas the magnitude remains relatively stable.
Building on this insight, we propose PhaseLock, a training-free framework that locks the valid motion priors into the denoising trajectory found in few-step inference. Rather than requiring 50 steps to establish physics, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Extensive experiments show an average improvement of 6.2 points across diverse models with negligible overhead (1.06× time, 1.02× memory), eliminating the need for expensive external guidance methods (~5× time).
Motion is encoded in the phase spectrum; texture lives in the magnitude. The denoising chain refines magnitude at the cost of phase — quietly erasing the physics.
A drop-in, training-free procedure in two stages.
Run 2 denoising steps to obtain a coarse latent sequence z2.
Its frame-to-frame deltas encode the motion prior.
Compute motion prior Δphys = z2(t+1) − z2(t)
— a target dynamics signal grounded in physically-valid evolution.
During the 50-step run, add Latent Delta Guidance with strength λ = 0.05
(linear decay) to align the high-fidelity trajectory to Δphys.
Drag the slider (or press play) to trace the denoising process. Watch Phase erode on the baseline path while PhaseLock keeps it locked after step 2.
λ0=0.05 and
NFE=2; performance monotonically degrades as the prior trajectory lengthens,
consistent with the phase-erosion theory.
Side-by-side comparisons on PhyGenBench and Physics-IQ. Highlights are bundled with this page; the full set of 45 pairs streams from the media gallery. Hover any panel to play.
@misc{han2026phaselock,
title = {Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them},
author = {Han, Woojung and Kang, Seil and Jun, Youngjun and
Chen, Min-Hung and Yang, Fu-En and Hwang, Seong Jae},
year = {2026}
}