Force Prompting

Overview

We investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric.

The main challenge of force prompting is the difficulty in obtaining high quality paired force-video training data. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning from videos synthesized by Blender, even with limited demonstrations of few objects (e.g., flying flags, rolling balls, etc.). Our method can generate videos which simulate forces across diverse geometries, settings, and materials. We also try to understand the source of this generalization and perform ablations on the training data that reveal two key elements: visual diversity and the use of specific text keywords during training.

In addition, our approach is trained on only around 15k training examples for a single day on four A100 GPUs, making these techniques broadly accessible for future research.

While currently the results are not real-time or per-frame causal (though it is causal with respect to the conditioning signal), we believe that they show the potential of future video generation models as they get faster, more efficient, and more powerful.

Local Force Prompts

Interactive Force Prompting Demos: Try It Yourself! Click on a thumbnail below to select a demo. Then, click on the white bead in the image and drag along the indicated line. Release the mouse to see the generated video!

Complex Motion

Oscillatory Motion

Linear Motion

Interactive Object - Drag Bead Along Line

Global Force Prompts

Interactive Force Prompting Demos: Try It Yourself! Click on a thumbnail below to select a demo. Then, click on the wind icon to select a wind direction and release the mouse to see the generated video!

Tethered Motion

Aerodynamic Motion

Fluid Dynamics

Interactive Angle Selector - Hair (Tethered)

Interactive Angle Selector - Hair (Aerodynamic)

Interactive Angle Selector - Hair (Fluid)

To demonstrate the point force model's versatility, we curate a benchmark using first-frame images from some prominent physics-in-the-loop papers. We are not claiming that the Force Prompting method outperforms those methods on visual fidelity or physical realism. Rather, we wish to illustrate that our purely neural method can handle some of the same visual scenarios almost as effectively as approaches which require some combination of 3D assets and explicit physics simulation at inference time.

Recreating a PhysDreamer (ECCV 2024) demo

Recreating a DreamPhysics (AAAI 2025) demo

Recreating a MotionCraft (NeurIPS 2024) demo

Recreating a PhysGaussian (CVPR 2024) demo

Recreating a PhysGen (ECCV 2024) demo

Recreating a Physics3D demo

Recreating a PhysMotion demo

Recreating a PhysGen3D demo

The same force results in different motion depending on the object's inferred mass

Single book vs. stack of books

Empty laundry basket vs. full laundry basket

Single cube vs. stack of cubes

Wooden ornament vs. metal ornament

BibTeX

@misc{gillman2025forcepromptingvideogeneration, title={Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals}, author={Nate Gillman and Charles Herrmann and Michael Freeman and Daksh Aggarwal and Evan Luo and Deqing Sun and Chen Sun}, year={2025}, eprint={2505.19386}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.19386}, }

Force Prompting: Video Generation Models CanLearn and Generalize Physics-based Control Signals

1. Train a Force Conditioned Video Model with Limited Synthetic Data

Local Force Model (Poke)

Global Force Model (Wind)

2. Video Model Generalizes Force Conditioning

Generalizes to Different Settings and Materials

Generalizes to Different Objects and Geometries

Generalizes to Different Affordances

Hints at Mass Understanding

Overview

Interacting with Images Using Force Prompts

A user can interact with an image by specifying a force vector (location, angle, magnitude) on the image. With this force prompt, the video generator then generates the resultant scene. No physics simulator used at inference time!

While currently the results are not real-time or per-frame causal (though it is causal with respect to the conditioning signal), we believe that they show the potential of future video generation models as they get faster, more efficient, and more powerful.

Local Force Prompts

Interactive Force Prompting Demos: Try It Yourself! Click on a thumbnail below to select a demo. Then, click on the white bead in the image and drag along the indicated line. Release the mouse to see the generated video!

Complex Motion

Oscillatory Motion

Linear Motion

Global Force Prompts

Interactive Force Prompting Demos: Try It Yourself! Click on a thumbnail below to select a demo. Then, click on the wind icon to select a wind direction and release the mouse to see the generated video!

Tethered Motion

Aerodynamic Motion

Fluid Dynamics

Training dataset diversity

Force Prompting Can Recreate Some Demos forPrior Works that Use a Physics Simulator at Inference

Recreating a PhysDreamer (ECCV 2024) demo

Recreating a DreamPhysics (AAAI 2025) demo

Recreating a MotionCraft (NeurIPS 2024) demo

Recreating a PhysGaussian (CVPR 2024) demo

Recreating a PhysGen (ECCV 2024) demo

Recreating a Physics3D demo

Recreating a PhysMotion demo

Recreating a PhysGen3D demo

Hints at Mass Understanding

The same force results in different motion depending on the object's inferred mass

Single book vs. stack of books

Empty laundry basket vs. full laundry basket

Single cube vs. stack of cubes

Wooden ornament vs. metal ornament

Analysis of Effect of Text Keywords on Generalization

"Wind" keyword is important at train time but not at inference time

Analysis of Effect of Visual Diversity on Generalization

Background Diversity

Number of Flags for Wind

Number of Balls for Poke

Limitations

Failure Case #1: The Physics is Out-of-Domain for the Base Video Model

The dust is blown in the prompted direction, but the base video model has difficulty generating a physically plausible person-plow-ground interaction

The kite is blown in the prompted direction, but the base video model has difficulty generating a physically plausible video of a kite dragging a person

The egg rolls in the prompted direction, but the base video model has difficulty rolling non-spherical objects, so the egg appears to float

Failure Case #2: The Base Video Model's Prior Competes with the Force Prompt

The rocking chair moves in the prompted direction, but the base video model has trouble distinguishing between foreground and background objects

The rubber duck moves in the prompted direction but bobs up and down due to the base model's prior. Also, all objects move because the base model struggles with object atomicity for complex scenes

The confetti moves in the prompted direction, but the base video model forces the scene to conjure extra confetti

Computational Resources

Our approach is trained on only around 15k training examples for a single day on four A100 GPUs, making these techniques broadly accessible for future research.

Related works

BibTeX

Force Prompting: Video Generation Models Can
Learn and Generalize Physics-based Control Signals

1. Train a Force Conditioned Video Model
with Limited Synthetic Data

Force Prompting Can Recreate Some Demos for
Prior Works that Use a Physics Simulator at Inference