Fast AI Video from Text: It’s Finally Real

SRZ · June 15, 2025, 9:01am

Real-Time AI Video is Here

You can now make short videos from text prompts in under a minute using your own computer. No crazy setup. Just a regular graphics card and a new method called Self-Forcing.

Step-by-Step Summary

Get Files ➜ Set Up Workflow ➜ Tweak Settings ➜ Generate ➜ Done

What’s “Self-Forcing”?

It’s a smarter way to train video models by copying how the model will actually work while training it.

Think of it like teaching someone to drive by actually driving, not just reading a book.

This trick makes video generation much faster and smoother.

Tools You Need

Website: self-forcing.github.io
Code & Models: GitHub – Self-Forcing
First demo post: Tweet/X

How Fast Is It? (Real People’s Tests)

Graphics Card	Frames	Time Taken	Size	Extra Notes
RTX 4070Ti	81	45 sec	832x480	Fast with extra settings
RTX 4080	81	57 sec	832x480	Plug and play
RTX 3090	81	59 sec	832x480	Standard setup
RTX 5070Ti	81	24 sec	800x600	Second try using “–fast” mode
RTX 2080Ti	81	70 sec	480p	Still works great
RTX 3060	79	108 sec	576x576	Works better with a few tricks

Tips:

Use --fast to cut down time
Try ComfyUI with FP16 mode for better results
Add VACE and CausVid for cleaner animation with fewer steps

Video Demos You Can Watch

Example clip: Imgur Samples
YouTube Run: 201 Frames in 33s
Workflow Setup: Step-by-step Guide

Cool Tricks You Can Try

VACE + CausVid

Makes the video smoother and more animated using fewer steps (like 4 steps only).
You might lose a bit of color quality, but the motion gets much better.

Start from an Image

Yes, you can turn an image into a moving clip. Some setups treat it like a starting frame.

Long Clips

Use FramePack to get longer animations with more stable motion.

Things to Know Before You Try

Most clips are limited to 200 frames
Lower steps = faster, but also lower detail
Older GPUs like GTX 1650 might not work well
Sometimes you’ll see errors like:
mat1 and mat2 must have the same dtype...
(That just means something’s mismatched — an easy fix)

What People Are Saying

“Feels like 400 watts of woman.”
“Clippy’s new form is here and it’s hot.”
“Real-time AI girlfriend next?”
“Imagine GoldenEye on N64 but in movie style!”
“I didn’t understand anything, but I want it.”

What’s Coming Soon

Image to Video (I2V) is being tested
Better motion using frame fillers
Longer clips with better details
Google-level video AI — but running at home

All Important Links in One Place

What	Link
Website	https://self-forcing.github.io
Code + Models	https://github.com/guandeh17/Self-Forcing
Launch Tweet	https://x.com/xunhuang1995/status/1932107954574275059
Workflow Guide	https://www.reddit.com/r/StableDiffusion/comments/1l7vwke/simple_workflow_for_self_forcing_if_anyone_wants/
YouTube Demo	https://www.youtube.com/watch?v=irUpybVgdDY
Model on CivitAI	https://civitai.com/models/1668005?modelVersionId=1887963
Related Project	https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

Final Thought

It’s not perfect. But it’s fast.
And it’s just the beginning.

You can now make your own videos from text — with your own graphics card — in less than a minute.