Real-Time AI Video is Here
You can now make short videos from text prompts in under a minute using your own computer. No crazy setup. Just a regular graphics card and a new method called Self-Forcing.
Step-by-Step Summary
Get Files ➜ Set Up Workflow ➜ Tweak Settings ➜ Generate ➜ Done
What’s “Self-Forcing”?
It’s a smarter way to train video models by copying how the model will actually work while training it.
Think of it like teaching someone to drive by actually driving, not just reading a book.
This trick makes video generation much faster and smoother.
Tools You Need
Website: self-forcing.github.io
Code & Models: GitHub – Self-Forcing
First demo post: Tweet/X
How Fast Is It? (Real People’s Tests)
Graphics Card | Frames | Time Taken | Size | Extra Notes |
---|---|---|---|---|
RTX 4070Ti | 81 | 45 sec | 832x480 | Fast with extra settings |
RTX 4080 | 81 | 57 sec | 832x480 | Plug and play |
RTX 3090 | 81 | 59 sec | 832x480 | Standard setup |
RTX 5070Ti | 81 | 24 sec | 800x600 | Second try using “–fast” mode |
RTX 2080Ti | 81 | 70 sec | 480p | Still works great |
RTX 3060 | 79 | 108 sec | 576x576 | Works better with a few tricks |
Tips:
- Use
--fast
to cut down time - Try ComfyUI with FP16 mode for better results
- Add VACE and CausVid for cleaner animation with fewer steps
Video Demos You Can Watch
Example clip: Imgur Samples
YouTube Run: 201 Frames in 33s
Workflow Setup: Step-by-step Guide
Cool Tricks You Can Try
VACE + CausVid
Makes the video smoother and more animated using fewer steps (like 4 steps only).
You might lose a bit of color quality, but the motion gets much better.
Start from an Image
Yes, you can turn an image into a moving clip. Some setups treat it like a starting frame.
Long Clips
Use FramePack to get longer animations with more stable motion.
Things to Know Before You Try
- Most clips are limited to 200 frames
- Lower steps = faster, but also lower detail
- Older GPUs like GTX 1650 might not work well
- Sometimes you’ll see errors like:
mat1 and mat2 must have the same dtype...
(That just means something’s mismatched — an easy fix)
What People Are Saying
“Feels like 400 watts of woman.”
“Clippy’s new form is here and it’s hot.”
“Real-time AI girlfriend next?”
“Imagine GoldenEye on N64 but in movie style!”
“I didn’t understand anything, but I want it.”
What’s Coming Soon
- Image to Video (I2V) is being tested
- Better motion using frame fillers
- Longer clips with better details
- Google-level video AI — but running at home
All Important Links in One Place
What | Link |
---|---|
Website | https://self-forcing.github.io |
Code + Models | https://github.com/guandeh17/Self-Forcing |
Launch Tweet | https://x.com/xunhuang1995/status/1932107954574275059 |
Workflow Guide | https://www.reddit.com/r/StableDiffusion/comments/1l7vwke/simple_workflow_for_self_forcing_if_anyone_wants/ |
YouTube Demo | https://www.youtube.com/watch?v=irUpybVgdDY |
Model on CivitAI | https://civitai.com/models/1668005?modelVersionId=1887963 |
Related Project | https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/ |
Final Thought
It’s not perfect. But it’s fast.
And it’s just the beginning.
You can now make your own videos from text — with your own graphics card — in less than a minute.