Research

Helios: Real Real-Time Long Video Generation Model

arXiv•March 4, 2026 ()•Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, Li Yuan

Professional Abstract

"The Helios model represents a significant advancement in the field of video generation, introducing a 14 billion parameter autoregressive diffusion model capable of generating videos at 19.5 frames per second (FPS) on a single NVIDIA H100 GPU. This model is particularly notable for its ability to generate minute-scale videos while maintaining quality comparable to existing strong baselines. The development of Helios addresses several critical challenges in video generation, including long-video drifting, real-time generation capabilities, and efficient training methodologies. One of the primary innovations of Helios is its robustness to long-video drifting, a common issue in video generation that can lead to inconsistencies and repetitive motion. Unlike traditional methods that rely on anti-drifting heuristics such as self-forcing or keyframe sampling, Helios employs a novel training strategy that simulates drifting during the training process. This approach allows the model to learn to mitigate drifting effectively while eliminating repetitive motion at its source. In terms of performance, Helios achieves real-time video generation without the use of standard acceleration techniques like KV-cache or sparse attention. This is a significant achievement, as many existing models require complex optimizations to achieve similar speeds. The model's efficiency is further enhanced by its ability to compress historical and noisy context, as well as by reducing the number of sampling steps, resulting in computational costs that are comparable to or even lower than those of smaller models with only 1.3 billion parameters. Helios also stands out for its training efficiency, as it does not rely on parallelism or sharding frameworks. This enables the model to utilize image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Such optimizations are crucial for researchers and developers working with limited computational resources. The authors conducted extensive experiments to validate the performance of Helios, demonstrating that it consistently outperforms prior methods in both short- and long-video generation tasks. The results indicate that Helios not only meets but exceeds the expectations set by existing models in terms of quality and efficiency. To support further research and development in the community, the authors plan to release the code, base model, and a distilled version of the model. This open-source approach is expected to foster collaboration and innovation in the field of video generation, allowing other researchers to build upon the advancements made with Helios. Overall, the introduction of the Helios model marks a pivotal moment in video generation technology, with its unique capabilities and performance metrics paving the way for future developments in this rapidly evolving area of artificial intelligence."

Technical Insights

1Helios is the first 14B video generation model capable of generating videos at 19.5 FPS on a single NVIDIA H100 GPU.

2The model supports minute-scale video generation while matching the quality of strong baseline models.

3Helios demonstrates robustness to long-video drifting without relying on traditional anti-drifting heuristics.

4The model employs innovative training strategies that simulate drifting, effectively eliminating repetitive motion.

5Real-time generation is achieved without standard acceleration techniques like KV-cache or sparse attention.

6Helios compresses historical and noisy context and reduces sampling steps, leading to lower computational costs than smaller models.

7The model can fit up to four 14B models within 80 GB of GPU memory, enhancing training efficiency without parallelism or sharding.

8Extensive experiments show Helios consistently outperforms prior methods in both short- and long-video generation tasks.

9The authors plan to release the code, base model, and distilled model to encourage community development and collaboration.