Helios: Real Real-Time Long Video Generation Model
Professional Abstract
"The Helios model represents a significant advancement in the field of video generation, introducing a 14 billion parameter autoregressive diffusion model capable of generating videos at 19.5 frames per second (FPS) on a single NVIDIA H100 GPU. This model is particularly notable for its ability to generate minute-scale videos while maintaining quality comparable to existing strong baselines. The development of Helios addresses several critical challenges in video generation, including long-video drifting, real-time generation capabilities, and efficient training methodologies. One of the primary innovations of Helios is its robustness to long-video drifting, a common issue in video generation that can lead to inconsistencies and repetitive motion. Unlike traditional methods that rely on anti-drifting heuristics such as self-forcing or keyframe sampling, Helios employs a novel training strategy that simulates drifting during the training process. This approach allows the model to learn to mitigate drifting effectively while eliminating repetitive motion at its source. In terms of performance, Helios achieves real-time video generation without the use of standard acceleration techniques like KV-cache or sparse attention. This is a significant achievement, as many existing models require complex optimizations to achieve similar speeds. The model's efficiency is further enhanced by its ability to compress historical and noisy context, as well as by reducing the number of sampling steps, resulting in computational costs that are comparable to or even lower than those of smaller models with only 1.3 billion parameters. Helios also stands out for its training efficiency, as it does not rely on parallelism or sharding frameworks. This enables the model to utilize image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Such optimizations are crucial for researchers and developers working with limited computational resources. The authors conducted extensive experiments to validate the performance of Helios, demonstrating that it consistently outperforms prior methods in both short- and long-video generation tasks. The results indicate that Helios not only meets but exceeds the expectations set by existing models in terms of quality and efficiency. To support further research and development in the community, the authors plan to release the code, base model, and a distilled version of the model. This open-source approach is expected to foster collaboration and innovation in the field of video generation, allowing other researchers to build upon the advancements made with Helios. Overall, the introduction of the Helios model marks a pivotal moment in video generation technology, with its unique capabilities and performance metrics paving the way for future developments in this rapidly evolving area of artificial intelligence."