Google Releases VEO 3.1, New AI Video Model in Flow and API: What It Means for Enterprises

As expected after days of online leaks and rumors, Google has Unveiling VEO 3.1its latest AI video generation model, brought a suite of creative and technical upgrades aimed at improving narrative control, audio integration, and realism in AI-generated video.

While the updates have expanded the possibilities for hobbyists and content creators using Google’s online AI creation app, flowthe release also signals increased opportunities for enterprises, developers and creative teams looking for scalable, customizable video tools.

The quality is higher, the physics are better, the pricing is the same as before, and the control and editing features are more robust and varied.

mine Preliminary test It proved to be a powerful and performance model that is an instant delight with every generation. However, the look is more cinematic, polished and a bit more "artificial" By default, compared to rivals like Openei’s new Sura 2, released late last month, which may or may not be going after a particular user (the handheld has Sura Excel and "clear" style videos).

Expanded control over narration and audio

The Veo 3.1 is similar to its predecessor, the Veo 3 ((Released in May 2025) with improved support for dialogue, ambient sound, and other audio effects.

Native audio generation is now available in several key features in Flow, including “video to video,” “video to video,” and “extension.”" which give users the ability to: still convert photos to video; Use items, characters and objects from multiple images in a single video. And produce longer clips than the initial 8 seconds, up to 30 seconds or even 1+ plus when continuing from the last frame of a clip.

Previously, you had to manually add audio after using these features.

This addition gives users greater control over tone, emotion and storytelling—capabilities that were first required in post-production work.

In an enterprise context, this level of control can reduce the need for separate audio pipelines, offering an integrated way to create training content, marketing videos, or digital experiences with synchronized sound and visuals.

Google noted A blog post That update reflects user feedback that calls for deeper artistic control and better audio support. Gallegos emphasizes the importance of editing and refining the flow directly, without reworking scenes from scratch.

Rich input and editing capabilities

With VEO 3.1, Google introduces support for multiple input types and more granular control over generated output. The model accepts text prompts, images, and video clips as input, and also supports:

Reference photos (up to three) Guiding the appearance and style in the final output
First and last frame interpolation To create smooth transitions between fixed endpoints
Expansion of the scene which continues the video’s action or movement beyond its current duration

The purpose of these tools is to give enterprise users a way to improve the look and feel of their content.

Additional capabilities such as “insert” (add objects to scenes) and “remove” (delete elements or characters) are also being introduced, although not all are immediately available through the Gemini API.

Deploy to the platform

VEO 3.1 is accessible through Google’s existing AI services:

flowGoogle’s own interface for AI-assisted filmmaking
Gemini APIaimed at developers building video capabilities into applications
Vertex AIwhere Enterprise Integration will soon support VEO’s “Scene Extensions” and other key features

Availability through these platforms allows enterprise users to choose the right environment—GUI-based or programmatic—based on their teams and workflows.

Pricing and Access

The VEO 3.1 model is currently in Preview And only available at Paid level The cost structure of the Gemini API is similar to Wave 3, the previous generation of AI video models from Google.

Standard model: 4 0.40 per second of video
Fast model: $0.15 per second

There is no free tier, and users are only charged if a video is successfully generated. This model is consistent with previous VEO versions and provides predictable pricing for budget-conscious enterprise teams.

Technical specifications and output control

On VEO 3.1 output video 720p or 1080p resolutionwith a 24 fps frame rate.

Includes tenure options 4, 6, or 8 seconds With the ability to enhance videos, from a text prompt or uploaded images 148 seconds (over 2 and a half minutes!) When using the “Extend” feature.

The new functionality also includes tighter control over subjects and environments. For example, enterprises can upload a product image or visual reference, and VEO 3.1 will generate scenes that preserve its appearance and stylistic cues in the video. This can streamline creative production pipelines for retail, advertising, and virtual content production teams.

Initial reaction

The wider creator and developer community has responded to the launch of VEO 3.1 with a mixture of optimism and angry criticism – especially when comparing it to rival models like Openai’s Sura 2.

Matt Schumer, An AI founder of AI/HyperRight, among others, and an early adopter, described his initial reaction as “disappointment,” saying that the Veo 3.1 was “significantly worse than the Sura 2” and “much more expensive.”

However, he admitted that Google’s tooling — such as support for references and view expansion — is a bright spot in the release.

Travis Davidsa 3D digital artist and AI content creator, echoed some of that sentiment. Although he noted improvements in audio quality, particularly in sound effects and dialogue, he raised concerns about the remaining limitations of the system.

These include the lack of custom voice support, the inability to directly select generated voices, and a constant cap on generations of 8 seconds.

Davids also pointed out that character consistency across changing camera angles still requires careful input, while other models such as Sura 2 handle it more automatically. They questioned the absence of 1080p resolution for users on paid tiers like Flow Pro and expressed doubts about feature parity.

On a more positive end, @kimmonismus, A writer for the AI Newsletter stated that “VEO 3.1 is amazing”, though still concluded that OpenEye’s latest model is superior overall.

Collectively, these early impressions suggest that while VEO 3.1 provides meaningful tooling enhancements and new creative control features, expectations have changed as competitors challenge both quality and usability.

Adoption and scale

Since launching Flow five months ago, Google says 275 million videos Produced in various VEO models.

The pace of adoption suggests significant interest not only from individuals but also from developers and businesses experimenting with automated content creation.

Thomas Uljek, director of product management at Google Labs, highlighted that the release of VEO 3.1 brings the capabilities closer to how human filmmakers plan and shoot. These include scene composition, continuity across shots, and integrated audio—all areas that enterprises increasingly look to automate or synchronize.

Safety and responsible AI use

Videos created with VEO 3.1 are watermarked using Google Synthesis technology, which embeds an immutable identifier to indicate that content has been AI-infused.

Google applies security filters and moderation to its APIs to help minimize privacy and copyright risks. Generated content is temporarily stored and deleted after two days unless downloaded.

For developers and enterprises, these features provide assurance around provisioning and compliance.

Where VEO 3.1 stands out among a crowded AI video model space

VEO 3.1 isn’t just an iteration on previous models—it represents a deep integration of multimodal input, storytelling controls, and enterprise-level tooling. While creative professionals may see immediate benefits in editing workflows and fidelity, businesses looking for automation in training, advertising, or virtual experiences may find even greater value in model composability and API support.

Early user feedback highlights that while VEO 3.1 offers valuable tooling, expectations about realism, sound control and generation length are rapidly evolving. As Google expands reach through Vertex AI and continues to improve VEO, its competitive position in enterprise video generation will depend on how quickly these user pain points are addressed.

Editor's pick

Get latest news