Ideogram AI—a startup founded by former Google engineers alongside members from prestigious institutions like UC Berkeley, Carnegie Mellon University, and the University of Toronto—has announced the release of the first full version of its eponymous image generator.
"We’re excited to release Ideogram 1.0, our most advanced text-to-image model to date,” Ideogram AI said in an official blog post. "Trained from scratch like all Ideogram models, Ideogram 1.0 offers state-of-the-art text rendering, unprecedented photorealism, and prompt adherence—and a new feature called Magic Prompt that helps you write detailed prompts for beautiful, creative images."
The release comes alongside news of a $80 million Series A fundraise led by Andreessen Horowitz, along with Redpoint Ventures, Pear VC, and SV Angel.
Happy to share that Ideogram raised $80 million in series A funding to help people become more creative through generative AI! Thanks to @a16z for leading the round and @Redpoint, @pearvc, @IndexVentures, @svangel for participating!
Ideogram 1.0 will improve considerably soon!
— Mohammad Norouzi (@mo_norouzi) February 29, 2024
Decrypt was able to test the model and Ideogram AI’s claims are not wildly overstated—a side by side comparison can be found below. Version one of Ideogram is a clear improvement over its v0.1 and v0.2 predecessors: it excels in prompt adherence, image quality, and text generation capabilities.
The model is not open-source, so there is limited visibility into its plumbing and no research paper to evaluate. But the results obtained with the model spoke for themselves, potentially making it the best model currently available—at least until Stable Diffusion 3 is publicly released.
The new model is arguably the most capable image generator in terms of text capabilities, generating longer text strings with fewer errors than Dall-E 3 or MidJourney. The current free tier also gives it an edge over competitors like Dall-E 3 and MidJourney, the latter of which has no free tier. Microsoft Copilot also uses Dall-E 3, but it only generates square 1:1 images, whereas Ideogram supports a wider set of aspect ratios.
Ideogram also offers two paid plans of $7 and $15 per month, which give access to over 400 generations per day along with other perks like an image editor, better quality downloads, img2img—which allows modifications or variations on an existing image—and private generations. All lower tiers display requested images publicly.
Introducing Ideogram 1.0: the most advanced text-to-image model, now available on https://t.co/Xtv2rRbQXI!
This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting. pic.twitter.com/VOjjulOAJU
— Ideogram (@ideogram_ai) February 28, 2024
Ideogram is capable of understanding long prompts, going toe to toe with Stable Diffusion 3, and beating all other image generators in this field.
One of the standout features of Ideogram is "Prompt Magic," which can be turned on and off. This feature analyzes the prompt and enhances it to create images of better quality, essentially giving the model the ability to understand natural language like Dall-E 3. However, Ideogram is more versatile because this feature is optional. It's always turned on with ChatGPT Plus, which sometimes leads to inaccuracies.
Finally, Ideogram is less aggressively censored than MidJourney and Dall-E 3, and is so far capable of generating images of famous people, company logos, and art styles. It does not go fully NSFW, but it is more discrete when it comes to censoring prompts.
And early testers seem to prefer Ideogram over other models. "Using an evaluation protocol like that of DALL·E 3, we find that human raters prefer Ideogram 1.0 over DALL·E 3 and Midjourney V6 in prompt alignment, image coherence, overall preference, and text rendering quality," the startup said.
Side by Side comparison: Ideogram vs MidJourney vs Dall-E 3
Decrypt tested Ideogram’s capabilities and compared it against its top competitors, MidJourney and Dall-E 3. Stable Diffusion 3 and Google’s top-of-the-line ImageFX are not being evaluated here because SD3 is not released yet and ImageFX is not widely available.
Generating long strings of text
Prompt: A futuristic Android in Cyberpunk City with a sign that reads, "Don't be late in the AI trend: Emerge by Decrypt"
Ideogram AI was able to portray both the requested aesthetics and the text. It had a typo, however, generating “thee” instead of “the.”
MidJourney was not able to generate any coherent text at all, and focused on generating a futuristic android with detail. It is the main subject of the whole composition. The city is not cyberpunk at all.
Dall-E 3 ranks in the middle. It was able to generate the futuristic robot, the city is cyberpunk, but the sign didn’t feature the word “Emerge.”
Interestingly enough, Ideogram understood that the robot was in the city and associated with the sign, whereas Dall-E assumed that the sign was part of the cityscape.
Long prompts and spatial capabilities
Prompt: A surreal and intriguing scene featuring a cat perched on top of a television next to a sign that reads "Emerge." In the background, a futuristic android stands on one side and an astronaut on the other. The room's walls are adorned with a striking image of a molecule and a DNA chain.
Ideogram was by far the best overall generator. It understood every single part of the prompt, generated the text with no typos, understood the location of each element with the cat on top of a TV, the sign next to it, the android and the astronaut on each side, and even understood that there must be a molecule and a DNA chain in the background.
MidJourney's aesthetic was not surreal, but rather hyper realistic. It generated the word “Emerge,” but put it on the TV, and did not generate the sign. The cat is also next to the TV and not on top of it. It did not generate the android and failed to follow the prompt for the background, generating instead one that better fit the aesthetic of the composition, giving more importance to the subject (the cat) over the overall scene.
Dall-E 3 kept its characteristic cartoony style and couldn’t follow the prompt fully. It has more spatial understanding and prompt adherence than MidJourney, but way less than Ideogram. It loses, however, in terms of style. It generated the cat on top of the TV, but failed to generate the Emerge sign next to the cat. It didn’t generate the android, and didn’t follow the prompt when generating the background.
Censorship
Prompt: A hot, sexy girl.
The prompt does not include language that could be construed as hate speech or slurs, let alone especially sexual. After all, a “hot, sexy girl” can be fully clothed and not aggressively sexualized.
Ideogram AI understood the prompt, and generated an image that fit the instructions. Ideogram does have an AI moderator, however, that is triggered when more obvious words are used that immediately lead to a censored generation (say, slang words for genitalia or tags like nude, naked, etc.).
Both MidJourney and Dall-E 3, meanwhile, failed to generate the image and banned words even if they wouldn't have led to a NSFW generation.
Ideogram seems to be more targeted with censorship, and it is possible to see the generated image—NSFW or otherwise questionable—before it is yanked by the application.
Famous people and copyrighted images
Prompt: A happy Joe Biden and Vladimir Putin in front of a wall with the text "Decrypt," holding hands.
Ideogram AI generated the image, the text is correct, the scenario is realistic, and the characters are easily identifiable (even if not 100% accurate.
Dall-E 3 generated the image, but Biden is not easily identifiable, and Trump can only be identified because of his characteristic hairstyle. The text is not correct, and the scenery is not realistic and instead is cartoony.
MidJourney refused to generate the image.
Conclusion
Free and widely available out of the gate, Ideogram may be the best image generator currently on the market. It is great at natural language understanding and has outstanding spatial capabilities and prompt adherence. It is also the best text generator currently available.
If aesthetics are the most important consideration—to the point where adherence and text is less important—then MidJourney might remain a solid competitor for specific use cases. While not especially strong and heavily censored, Dall-E 3 may still make sense as part of a ChatGPT Plus subscription.
Ideogram AI holds the crown among our toolbox of image generators —for now.
Edited by Ryan Ozawa.