Runway’s latest AI video generator brings giant cotton candy monsters to life

Expanding / A screen capture of a Runway Gen-3 Alpha video generated with the prompt, “A giant humanoid made of fluffy blue cotton candy stomps along the ground and roars into the sky, with a clear blue sky stretching out behind him.”

On Sunday, Runway unveiled its new AI video synthesis model. Gen-3 Alpha It’s still in development, but it appears to be able to create video of a similar quality to OpenAI’s Sora, which debuted (but has not yet been released) earlier this year. It can generate mind-blowing, high-definition videos from a variety of text prompts, depicting everything from lifelike humans to surreal monsters stomping through the countryside.

Unlike the runway Last year’s best model Starting in June 2023, Gen-3 Alpha, which was only able to produce two-second clips, will reportedly be able to create 10-second video segments of people, places, and things with coherence and consistency far beyond Gen-2. If 10 seconds seems short compared to a minute of video, consider that Sora is working with a tiny compute budget compared to the better-funded OpenAI, and has a track record of actually shipping video generation capabilities to commercial users.

Gen-3 Alpha does not generate audio to accompany video clips, and time-consistent generation (keeping characters consistent over time) is Similar high quality training materialsBut it’s hard to ignore the improvements in Runway’s visual fidelity over the past year.

AI Video Gets Hot

It’s been a busy few weeks for AI video synthesis in the AI research community, including the launch of a model from China. ClingDeveloped by Beijing-based Kuaishou Technology (also known as Kwai), Kling can generate up to two minutes of 1080p HD video at 30 frames per second. Detail and consistency It is said to match Sora.

Gen-3 Alpha Prompt: “A woman’s subtle reflection in the window of a train traveling at lightning speed through a Japanese city.”

Shortly after Kling’s debut, people took to social media to Surreal AI Video Using Luma AI Luma Dream MachineThese videos were novel and strange, but overall There was no consistency.we tried out the Dream Machine and were not impressed with anything we saw.

Meanwhile, one of the pioneers of text-to-video conversion, New York-based Runway, founded in 2018, was recently the target of memes showing its Gen-2 technology being poorly received compared to newer video composition models, which may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: “Astronauts running through the streets of Rio de Janeiro.”

Generating realistic humans has always been a challenge for video synthesis models, so Runway is specifically showcasing Gen-3 Alpha’s ability to generate what the developers call “expressive” human characters with a wide range of movements, gestures and emotions. But the company Examples provided They’re not particularly expressive – most of them just stare or blink slowly – but they look realistic.

Examples of humans provided include generated videos of a woman riding a train, an astronaut running down a street, a man with a TV light shining on his face, a woman driving a car, and a woman running.

Gen-3 Alpha prompt: “A close-up shot of a pensive young woman driving a car, with a blurry green forest visible through the rain-soaked car window.”

The generated demo videos also include some more surreal video compositing examples, such as a giant creature walking through a ruined city, a man made of rocks walking through a forest, and even a giant cotton candy monster, shown below, which is probably the best video on this entire page.

Gen-3 Alpha Prompt: “A giant humanoid made of fluffy blue cotton candy stomps along the ground and roars into the sky. There’s a clear blue sky behind him.”

Gen-3 will feature a range of Runway AI editing tools, one of the company’s best known features. Multi Motion Brush, Advanced Camera Controland Director ModeYou can create videos from text or image prompts.

Runway said Gen-3 Alpha is the first in a series of models trained on its new infrastructure designed for large-scale multi-modal training, and that “General world modelIt is a virtual AI system that builds an internal representation of an environment and uses it to simulate future events within that environment.