PhaseLock: Physics in 2-Steps

How PhaseLock Works

A single walkthrough of the two-stage pipeline. We run 2 denoising steps to extract a motion prior Δ_phys, then re-inject it as Latent Δ Guidance during the full 50-step run.

① Input

image + text

first frame of target scene

→

② Baseline · 50 steps waiting

running 50 steps…

T=50

clean texture —
but physics drifts

→ same model,
fewer steps

③ Same model · 2 steps waiting

Encoding…

T=2

coarse —
but physics is valid!

Δ_phys extracted

→ inject Δ_phys
as guidance

④ PhaseLock · 50 + Δ_phys waiting

50 + guidance…

+ PhaseLock · T=50

full fidelity
and correct physics

② — compare — ④ same model · same input · different physics

The counter-intuitive finding: running the same model for only 2 steps (③) already produces physically valid motion — though coarse. Running it for 50 steps (②) refines texture but hallucinates the dynamics. PhaseLock extracts Δ_phys from ③ and injects it back into the 50-step run (④), recovering both fidelity and physics.

The Counter-Intuitive Finding

Running fewer denoising steps should hurt quality — yet we find the opposite for physics.

Figure 1. Overview of PhaseLock. Few-step inference (T=2) captures accurate physical motion but lacks texture; standard inference (T=50) achieves photorealism but hallucinates the motion. PhaseLock extracts the motion prior from the 2-step trajectory and injects it during the 50-step denoising — recovering both fidelity and physics.

~18%

phase degradation
from step 2 → step 50

2

inference steps
enough for valid physics

+6.2

avg. points improvement
in physical consistency

1.06×

wall-clock overhead
(no training)

Abstract

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising: the phase degrades significantly (dropping by ≈18% from step 2 to step 50), whereas the magnitude remains relatively stable.

Building on this insight, we propose PhaseLock, a training-free framework that locks the valid motion priors into the denoising trajectory found in few-step inference. Rather than requiring 50 steps to establish physics, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Extensive experiments show an average improvement of 6.2 points across diverse models with negligible overhead (1.06× time, 1.02× memory), eliminating the need for expensive external guidance methods (~5× time).

Why Does This Happen? Phase Erosion

Motion is encoded in the phase spectrum; texture lives in the magnitude. The denoising chain refines magnitude at the cost of phase — quietly erasing the physics.

Figure 3. Further analysis on phase properties. (a) Blur control: even with Gaussian blur matched to the 2-step sharpness, the 2-step output retains significantly higher Phase Temporal Correlation — phase loss is structural, not a frequency artifact. (b) Phase Sensitivity: physical dynamics degrade rapidly under phase corruption, while magnitude corruption has little effect.

Method: PhaseLock

A drop-in, training-free procedure in two stages.

Figure 4. The overall pipeline of PhaseLock. (1) Motion Prior Extraction. A short 2-step trajectory yields coarse but physically valid motion dynamics. (2) Latent Delta Guidance. The per-frame motion delta is transferred into the standard denoising process, biasing the 50-step path toward the extracted prior — without any fine-tuning.

1

Extract

Run 2 denoising steps to obtain a coarse latent sequence z₂. Its frame-to-frame deltas encode the motion prior.

2

Lock

Compute motion prior Δ_phys = z₂^(t+1) − z₂^(t) — a target dynamics signal grounded in physically-valid evolution.

3

Guide

During the 50-step run, add Latent Delta Guidance with strength λ = 0.05 (linear decay) to align the high-fidelity trajectory to Δ_phys.

Interactive walkthrough

Drag the slider (or press play) to trace the denoising process. Watch Phase erode on the baseline path while PhaseLock keeps it locked after step 2.

Input

Phase integrity · Baseline 100%

Phase integrity · PhaseLock 100%

Results

Qualitative — Physics-IQ

Figure 5. Across diverse scenarios — fluid displacement, rigid-body drop, capillary flow — PhaseLock restores physically consistent motion that the baseline fails to produce.

Efficiency

Figure 7. Comparable latency and memory to the baseline — gains without a N× cost. Other inference-time guidance methods need >5× the compute for similar gains.

Ablation

Figure 6. Peaks at motion strength λ₀=0.05 and NFE=2; performance monotonically degrades as the prior trajectory lengthens, consistent with the phase-erosion theory.

74%

of Physics-IQ scenarios improved
on Wan 2.1 (49 / 66)

67%

improved on CogVideoX-5b
(44 / 66)

93%

of fluid-dynamics scenarios
improved (Wan 2.1)

88%

of optics scenarios
improved (CogVideoX)

Video Gallery

Side-by-side comparisons on PhyGenBench and Physics-IQ. Highlights are bundled with this page; the full set of 45 pairs streams from the media gallery. Hover any panel to play.

Citation

@misc{han2026phaselock,
  title  = {Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them},
  author = {Han, Woojung and Kang, Seil and Jun, Youngjun and
            Chen, Min-Hung and Yang, Fu-En and Hwang, Seong Jae},
  year   = {2026}
}