We investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric.
The main challenge of force prompting is the difficulty in obtaining high quality paired force-video training data. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning from videos synthesized by Blender, even with limited demonstrations of few objects (e.g., flying flags, rolling balls, etc.). Our method can generate videos which simulate forces across diverse geometries, settings, and materials. We also try to understand the source of this generalization and perform ablations on the training data that reveal two key elements: visual diversity and the use of specific text keywords during training.
In addition, our approach is trained on only around 15k training examples for a single day on four A100 GPUs, making these techniques broadly accessible for future research.
@misc{gillman2025forcepromptingvideogeneration,
title={Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals},
author={Nate Gillman and Charles Herrmann and Michael Freeman and Daksh Aggarwal and Evan Luo and Deqing Sun and Chen Sun},
year={2025},
eprint={2505.19386},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.19386},
}