Midjourney, the generative image creation tool perhaps best known for running inside a Discord server, is spreading its AI wings. The creators of Midjourney announced on Tuesday that they plan to introduce a “text to video” model in the next few months.
The company will begin training its video models starting in January, CEO David Holz said during an "Office Hour" Discord session. This move represents a natural progression for the platform, building upon a mature image model to stir the competitive dynamics of the generative video industry.
The Discord session notes included planned tweaks for V6 Niji —Midjourney’s manga/anime generator model—and consistency fixes for the upcoming official release of Midjourney V6. The company also wrote that its to-do list calls for “training for new video models to commence,” which could potentially be ready “in a few months.”
No further information about the model was shared by either Holz or the Midjourney team.
Midjourney is known for emphasizing quality and user experience over raw speed—even if it meant trailing behind competitors. The company rolled out enhancements like inpainting and outpainting months after the features became de facto in other platforms like Stable Diffusion, and its recent foray into rudimentary text generation came after it was a common capability in other models like Dall-E 3, SDXL, or even some less popular generators like Ideogram or IF.
Entering a crowded field
This venture into video also comes in the wake of releases from the competition. Stability AI recently announced Stable Video Diffusion; Meta just showcased its EMU video generator, and existing models like Pika and Runway ML are marking their territory, leaving Midjourney's entry to emerge into a robust competitive landscape. Additionally, other image generators like Leonardo AI have already implemented video generation capabilities, further intensifying the race.
The recent v6 update from Midjourney, boasting improved prompt following and more realistic images, is the company’s most recent effort to stay relevant and competitive. If its models show some cohesion, they could gain solid ground in such a nascent field—even with models that are still far from perfect.
The implications of these developments extend far beyond a corporate race for supremacy. As Midjourney and others innovate and refine their offerings, the creative and media industries stand on the brink of a transformative era. The ability to generate, manipulate, and interact with video content through AI opens up many possibilities—from making things easier for entertainers and advertisers to potentially reshaping how we perceive reality.
Edited by Ryan Ozawa.