ByteDance, the tech titan behind TikTok, just fired a thunderous salvo in the AI video generation arms race as the company's cloud division unveiled two video generators: PixelDance and Seaweed.
The generators, released at an event in Shenzhen last week, are still in private beta and only available to a limited number of users. However, the models could be publicly available next month depending on the outcome of the U.S. general election, claimed YouTuber Tim Simmons, who focuses on AI tools for content creators.
“I did speak to [an anonymous source] about this and the best I can say is don't hold your breath until after November because… politics,” he said in a video review of the models.
The demo videos were first shown on a Chinese site, WeiXin.
PixelDance focuses on AI-driven character animation, generating 10-second videos featuring startlingly lifelike human movements. The model delivers fluid, natural performances—characters walk, turn, pick up objects, and interact with their environment in ways previously thought impossible for AI.
But PixelDance's true magic lies in its multi-shot capabilities. The model maintains remarkable consistency in character appearance, proportions, and scene details across varying camera angles. That feature solves a major headache in AI video generation, where maintaining visual coherence between shots has long been a struggle. That's why most of the state-of-the-art video generators focus on generating fluid motion in one single video sequence.
PixelDance's camera control is also on par with other major models like Pika, Runway’s Gen 3, or Kling, making it a great addition for AI cinematography with little compromise. With a single, simple text prompt, users can orchestrate complex camera movements like 360-degree pans, zooms, tracking shots, and so on.
For example, the prompt for the following video roughly translates to: In black and white, the camera is shot around the woman in sunglasses, moving from her side to the front, and finally focuses on a close-up of the woman's face.
In other models, the camera control is made via the UI interface, with buttons and sliders.
Seaweed, PixelDance's sibling, pushes the envelope on environmental generation and consistency. The model stretches video generation to a full 30 seconds—and potentially extendable to nearly 2 minutes of consistent shots.
ByteDance's timing couldn't be more strategic. The AI video generation landscape has been in a state of excitement since OpenAI's Sora was announced in February. Sora's purported ability to generate up to 60 seconds of high-quality video from text prompts sent shockwaves through the tech world. However, Sora still hasn’t been released to the public and other companies are racing to fill that space.
Kuaishou, another Chinese tech giant, made waves in June with the launch of Kling AI, a model that a lot of reviewers put at the top of their list in terms of AI video quality. Integrated into Kuaishou's video editing app, Kling AI can also generate two-minute videos, surpassing even Sora's capabilities. The tool quickly amassed over 2.6 million users, who have collectively generated 27 million videos. However, it generates single-shot takes, making it comparable to Bytedance’s offer in terms of quality but a little less versatile in terms of features.
On Tuesday, Pika Labs—another O.G. in the generative video scene—released its new Pika 1.5 model, enhancing the capabilities of its already good and widely adopted video generator. “With more realistic movement, big screen shots, and mind-blowing Pikaffects that break the laws of physics, there’s more to love about Pika than ever before,” Pika Labs said in an official tweet
Sry, we forgot our password.
PIKA 1.5 IS HERE.With more realistic movement, big screen shots, and mind-blowing Pikaffects that break the laws of physics, there’s more to love about Pika than ever before.
Try it. pic.twitter.com/lOEVZIRygx
— Pika (@pika_labs) October 1, 2024
Pika 1.5 is available for testing on Pika’s official website, and social media is already filling up with videos showing how Pika can wildly transform scenes by crushing and exploding people and objects—or cut them open to reveal virtual cake within.
ByteDance built its latest video apps on the Doubao family of foundational models, based on a proprietary document image transformer (DiT) architecture. They are believed to share similarities with the technology powering Sora. The company claims to have optimized DiT for business applications, potentially lowering the cost barrier for AI video creation.
Pika 1.5 is pretty wild. When I said generative AI would let us edit reality, this is not what I had in mind... lol pic.twitter.com/xeRILX1byh
— Bilawal Sidhu (@bilawalsidhu) October 1, 2024
The Doubao AI family's explosive growth since its May launch underscores the models' potential. Daily token processing has skyrocketed from 120 billion to 1.3 trillion, reflecting a tenfold increase in usage. Doubao now processes over 50 million images and 850,000 hours of speech every day, as reported by Kr-Asia.
ByteDance's aggressive pricing strategy has fueled this growth. Since May, the company has slashed its cost per 1,000 tokens to fractions of a cent, igniting a fierce price war among major players like Alibaba, and Tencent.
Clearly, ByteDance's strategy—leaning heavily into AI for its algorithm generation on TikTok—is paying off. TikTok and Douyin, its Chinese version, have been the fastest-growing social media platforms in recent years, but the fact that they are owned by a Chinese technology company has been concerning to Western countries.
It's unclear whether ByteDance will integrate its generative AI models into its apps—similar to Meta incorporating its Llama-based LLMs and generators into Instagram and WhatsApp—and even more uncertain whether U.S. citizens will have access to them once they are publicly released.
Edited by Andrew Hayward