
A new artificial intelligence startup founded by creators The world’s most widely used computer vision library Stealth has emerged with technology that produces realistic human-centric videos for up to five minutes. Surah And Google’s Veo.
Craft Storywhich launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most critical limitations plaguing the nascent AI video industry: duration. While Openai’s Surah 2 Topping out at 25 seconds and most competing models producing clips of 10 seconds or less, CraftStory’s system can produce consistent, cohesive video performances that last up to a typical YouTube tutorial or product demonstration.
This development can unlock substantial business value for businesses struggling to scale video production for training, marketing, and customer education. In markets where short AI-infused clips have proven inadequate despite their visual polish.
"If you actually try to create a video with one of these video generation systems, you’ll find that many times you want to implement a certain creative vision, and regardless of how detailed the instructions are, the systems basically ignore part of your instructions," In an exclusive interview with VentureBeat, CraftStory founder and CEO Viktor Irokhimov said: "We developed a system that basically can produce videos for as long as you need."
How parallel processing solves the long-form video problem
CraftStory’s breakthrough hinges on what the company describes as a parallel architecture — a fundamentally different approach to how AI models generate video than the sequential methods used by most competitors.
Traditional video generation models operate by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate long videos, these models require proportionally larger networks, more training data, and significantly more computational resources.
Craft Story Instead, multiple mini-diffusion algorithms are run simultaneously throughout the duration of the video, with two-way constraints connecting them. "The last part of the video can also affect the previous part of the video," Erokhimov explained. "And this is very important, because if you do it one by one, then a pattern that appears in the first part spreads to the second, and then it accumulates."
Instead of generating eight seconds and then stitching on additional segments, Craft Story’s system processes five minutes as well as five minutes through an integrated diffusion process.
Importantly, CraftStory trained its model on proprietary footage instead of relying entirely on videos scraped from the Internet. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers—avoiding motion blur in standard 30-frame-per-second YouTube clips.
"What we showed is that you don’t need a lot of data and you don’t need a huge training budget to make high-quality videos," Orokhimov said. "All you need is high quality data."
Model 2.0 currently works as a video-to-video system: users upload a still image to trigger and a "Driving video" Contains a person whose movements the AI ​​will mimic. Craft Story provides pre-driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.
The system produces 30-second clips at low resolution in about 15 minutes. An advanced lip sync system syncs mouth movements to scripts or audio tracks, while gesture alignment algorithms match body language to speech rhythms and emotional tone.
Fighting a war chest battle with $2 million against billions
Craft Story’s funding comes almost entirely from scratch Andrew Filowhich sold its project management software company Raik to Citrix 25 2.25 billion In 2021 and runs now Zincoderan AI coding company. The modest increase stands in stark contrast to the billions poured into competing efforts. Openai has it More than $6 billion was raised Only in its latest funding.
Urokhimov pushed back on the idea that large-scale capital was a prerequisite for success. "I don’t necessarily buy the thesis that computing is the path to success," He said. "It definitely helps if you have a computer. But if you collect a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors."
Philo defends the David versus Goliath approach. "When you invest in a startup, you’re essentially betting on people," He said in an interview with VentureBeat. "To paraphrase Margaret Mead: Never underestimate what a small group of thoughtful, committed engineers and scientists can build."
He argued that the craft story benefits from a focused strategy. "Major labs are in an arms race to create general-purpose video foundation models," Philo said. "Craft Story is riding this wave and going deep into a specific format: long-form, engaging, human-centric video."
Why Computer Vision Skills Matter in Productive AI Video
Erkhimov’s reputation stems from his deep roots in computer vision, rather than the transformer architecture that dominates recent AI developments. He was an early assistant Open CV – An open source computer vision library that has become the de facto standard for computer vision applications, with over 84,000 stars on GitHub.
When Intel dropped its support for OpenCV in the mid-2000s, Erokhimov founded it with the express goal of maintaining and advancing the library. The company significantly expanded OpenCV and branched out into automotive safety systems before being acquired by Intel in 2016.
Philo said it’s this background in particular that makes Erkhimov well-positioned for video generation. "What people sometimes miss is that generative AI video isn’t just about the production part. It’s about understanding movement, facial dynamics, temporal coordination, and how humans actually move." Philo said. "Victor has spent his career mastering exactly these issues."
Enterprise focuses on training videos and product demos
While much of the public excitement surrounding AI video generation has focused on creative tools for consumers, Craft Story is pursuing a decidedly enterprise-oriented strategy.
"We’re definitely thinking about B2B more than consumers," Orokhimov said. "We’re thinking about companies, especially software companies, being able to create cool training videos and product videos and launch videos."
The logic is straightforward: Corporate training, product tutorials, and customer education videos often last several minutes and require consistent quality. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.
"If you need a long-form video, you should go with us," Orokhimov said. "We can create continuous video, high quality, up to five minutes."
Philo echoed that assessment. "A big gap in this market is the lack of models that can produce more consistent videos than long sequences—and that’s critical for real-world use," He said. "If you’re making a commercial for your company, a 10-second video, no matter how good it sounds, just isn’t enough. You need 30 seconds, you need two minutes – you need more."
The company expects cost savings for customers. Philo suggested this "A small business owner can produce content in minutes that previously would have cost $20,000 and taken two months to produce."
Craft Story is also courting creative agencies that produce video content for corporate clients, with a value proposition based on price and speed: Agencies can record an actor on camera and turn that footage into AI-ready video instead of managing an expensive multi-day shoot.
The next major development on CraftStory’s roadmap is a text-to-video model that allows users to produce long-form content directly from a script. The team is also developing support for moving camera scenarios, including the popular "Walk and talk" A common format in high-end advertising.
Where Craft Story Fits into a Fragmented Competitive Landscape
Craft Story enters a crowded and rapidly evolving market. Openai’s Surah 2while not yet publicly available, has generated significant buzz. Google’s Veo model Moving quickly. The runwayfor , for , for , . Pikaand Stability AI Offers all video generation tools with all capabilities.
Erkhimov acknowledges the competitive pressure but emphasizes that Craft Story serves a distinct niche focused on human-centric videos. They positioned rapid innovation and market capture as the company’s core strategy rather than relying on technological moats.
Philo sees the market fragmenting into distinct layers, in which the big tech companies operate "API providers of powerful, general-purpose generated models" While niche players like Craft Story focus on specific use cases. "If the big players are building the engines, Craft Story is building the top production studio and assembly line," He said.
Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to consumers and businesses interested in testing the technology. Whether a lightly funded startup can capture meaningful market share against those with deep pockets, Erokhimov is characteristically confident about the next opportunity.
"AI-infused video will soon become the primary way companies communicate their stories," He said.