Similar to OpenAI’s Sora, Google unveiled Veo, a new AI video synthesis model, at Google I/O 2024 on Tuesday. Veo can produce HD videos from text, image, or video cues. Although it hasn’t been made available for public usage yet, it can edit videos from written instructions and produce 1080p recordings longer than a minute.
According to reports, Veo has the capacity to create video sequences up to and including 60 seconds from a single query or a sequence of prompts that construct a narrative. It can also apparently modify pre-existing videos using text instructions and retain visual consistency across frames. According to the business, it can create complex sceneries and incorporate cinematic elements like time-lapses, aerial views, and different visual styles.
A number of additional image synthesis and video synthesis models have emerged since the debut of DALL-E 2 in April 2022, with the goal of enabling anyone with the ability to enter a written description to produce a detailed image or video. Both AI picture and video generators have been progressively becoming more powerful, even if neither technology has reached its peak.
We reported on a sneak peek of OpenAI’s Sora video generator in February, which many at the time thought to be the best AI video synthesis money could buy. Tyler Perry was so amazed that he decided to put off expanding his film studio. But up until now, OpenAI has only allowed a small number of testers to use the program; they haven’t made it available to the larger audience.
Now, Veo from Google seems, at first appearance, to be able to produce videos in a similar way to Sora. We can only rely on the carefully chosen demonstration videos that the company has posted on its website because we haven’t used it ourselves. This means that since the generating results might not be average, anyone viewing them should treat Google’s claims with extreme caution.
A time-lapse of a sunflower opening, a fast-tracking view down a suburban street, kebabs sizzling on a grill, a cowboy riding a horse, and more videos are among Veo’s sample videos. Clearly missing are any in-depth representations of people, which have traditionally proven difficult for AI picture and video models to produce without obvious errors.
Among the video generating models that Google claims Veo expands upon are Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. Veo uses compressed “latent” video representations and more descriptive video captions in its training data to improve both quality and efficiency. In order to enhance the quality of Veo’s video creation, Google added full captions to the training videos, which helped the AI recognize cues more precisely.
“When given both an input video and editing command, like adding kayaks to an aerial shot of a coastline, Veo can apply this command to the initial video and create a new, edited video,” the business explains. This is another noteworthy feature of Veo that makes it look noteworthy.
Although the demos appear impressive at first (especially when contrasted with Will Smith eating pasta), Google admits that creating videos using AI is challenging. “Maintaining visual consistency can be a challenge for video generation models,” the business states. “Characters, objects, or even entire scenes can flicker, jump, or morph unexpectedly between frames, disrupting the viewing experience.”
Google has made an effort to reduce those disadvantages by using “cutting-edge latent diffusion transformers,” which is essentially just nebulous marketing speak. However, the business is so sure in the concept that it is collaborating with actor Donald Glover and his studio, Gilga, to produce an AI-generated demo film that will have its premiere soon.
Initially, a limited group of producers will have access to Veo via VideoFX, a brand-new experimental tool that can be found on labs.google, the website for Google’s AI Test Kitchen. In the upcoming weeks, creators who would like to be considered for access to Veo’s features can sign up for a waitlist at VideoFX. In the future, Google intends to incorporate some of Veo’s features into YouTube Shorts and other products.
Where Google obtained the training data for Veo is still unknown, although if we had to speculate, YouTube was probably involved. However, Google claims that with Veo, it has adopted a “responsible” strategy. According to the business, “Videos created by Veo are watermarked using SynthID, our cutting-edge tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes that help mitigate privacy, copyright, and bias risks.”
Topics #AI #AI Video #Artificial Intelligence #ChatGPT #Google #news #OpenAI #Sora #Veo #video