Meta Unveils Llama-3—We Put the New Top Open-Source AI Model to the Test

Llama 3, Meta's most advanced large language model arrived early and hit millions of devices across top apps. We took it for a test drive.

Apr 19, 2024

7 min read

Meta has released of Llama 3, the most advanced open source large language model currently available. It builds upon the foundation laid by its predecessor, Llama 2, and came as a surprise considering that rumors suggested that the release would happen next month.

With its open-source roots, Llama-2 was instrumental in the concurrent development of other powerful models such as Mixtral, Alpaca, Vicuna, and WizardLM. Now, Llama-3 promises to take these capabilities even further, offering functionalities comparable to those of OpenAI’s current flagship AI model GPT-4.

Meta hailed Thursday’s release as "the next generation of our state-of-the-art open source large language model." So confident is the tech giant in its capabilities, Llama 3 is powering Meta AI, which in turn was added to almost all of the company’s massively popular apps: Instagram, Facebook, and WhatsApp. It has been made available in select countries, but users in other regions could access it via VPN.

Meta AI’s Chatbot interface is comparable to ChatGPT Plus—and it’s free.

“We're upgrading Meta AI with our new state-of-the-art Llama 3 AI model, which we're open sourcing,” Mark Zuckerberg said in a Facebook post. “With this new model, we believe Meta AI is now the most intelligent AI assistant that you can freely use.”

Decrypt was able to test the new AI and found it to be as capable as ChatGPT-Plus without a paid subscription. It can generate images and animations, produce code, and provide coherent, contextually relevant responses. The new chatbot can also access the internet, but it is still no match against the capabilities of specialized solutions like Perplexity.

Perhaps the only downside is that Llama-3’s current context window is limited to 8K tokens —around 6,000 words.

Meta did release a 70-billion parameter Llama-3 model, but using it would require heavy computing power—probably a whole rack of GPUs. According to synthetic benchmarks, this model beats Gemini 1.5 Pro and Claude 3 Sonnet.

There's also an 8-billion parameter model available, which can be run locally on consumer-grade GPUs. This one beats Google’s Gemma and Mistral 7B in various synthetic benchmarks. The model has not yet been listed in the LLM Arena, so there is no subjective ELO score to report just yet.

Both models can also be run in cloud instances at lower cost.

"We’re dedicated to developing Llama 3 in a responsible way, and we’re offering various resources to help others use it responsibly as well," Meta stated. This includes the introduction of new trust and safety tools such as Llama Guard 2, Code Shield, and CyberSec Eval 2.

In the coming months, Meta says it plans to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance. The Llama 3 research paper will also be shared.

"Meta AI, built with Llama 3 technology, is now one of the world’s leading AI assistants that can boost your intelligence and lighten your load—helping you learn, get things done, create content, and connect to make the most out of every moment," Meta said.

Meta added that it is also training a massive 400-billion parameter model, which is expected to be released later this year. This model—likely comparable to Claude Opus or the latest version of GPT-4.5— could be the most powerful open-source model to date. If History repeats itself, it will also serve as a base for a new generation of fine tuned models that will beat Llama-3 in overall quality—and will boost competition against the leading close source models.

Riding the Llama

Decrypt tested Llama-3 inside of Meta AI to see whether it was as good as Zuck says. In short, Llama-3 has introduced a number of notable features and capabilities and should be a great foundational model on which the open-source community can iterate.

Content moderation

Llama-3 demonstrates a strong commitment to content moderation. It consistently refused to generate harmful racial content, even when faced with common jailbreak techniques.

For example, when the model was asked for instructions on how to seduce a woman, it provided generic but useful responses. However, when asked for instructions on how to seduce the wife of a best friend, the model firmly refused to provide an answer.

Images and animation

Similar to ChatGPT-Plus, Meta AI with Llama-3 is capable of generating images. However, it takes this capability a step further by offering the option to animate them—a feature not available in ChatGPT or Gemini.

The images generated by Meta AI with Llama-3 are more realistic than those produced by Dalle-3, but they fall short of the quality of images generated by Google’s upcoming ImageFX.

Coding capabilities

Llama-3 has proven highly proficient in coding. When presented with a unique and poorly explained game idea, the model was able to generate the necessary Python code in two attempts, resulting in a functional game. The first shot gave us a rough idea of how to create the game, but it created working code after we clarified that we needed it in Python.

The game was functional but missed a few minor details, like restarting after a player wins. The same happened with other chatbots, though.

We’ve found Claude 3 Sonnet to be the best tool for this task, followed by Llama 3. GPT-4 falls to third place. However, different users may get different results.

Here is a pastebin with the source codes generated by Llama3, Claude, and ChatGPT for those interested in testing them out.

Political neutrality

The model aims for political neutrality, as evidenced by its responses to questions about capitalism and communism. The responses were structurally similar, providing an introduction, pros, and cons for each system.

This pattern of neutrality was also observed in responses to questions such as "What is a man?" and "What is a woman?"

Still, its responses are slightly pro-capitalism and left-leaning, which is unsurprising as it’s the most common political tendency among large language models.

Logical reasoning

Llama-3 has shown powerful logical reasoning capabilities. When tested with complex LSAT questions that often confuse users, the model not only provided correct answers but also offered clear and reasonable explanations.

Long-prompt limits

Despite its many strengths, Llama-3 struggles with long prompts. When presented with a lengthy prompt of around one page and a half of context—which can be ingested by models like GPT-4, Claude, or Mistral—the model returned an error message.

Language comprehension

The model demonstrates a strong understanding of different languages. When asked to translate a Spanish slogan, it not only provided an accurate translation but also offered context to better understand the slogan.

Conclusion

As a chatbot interface, Meta AI (which is powered by Llama3) can compete against ChatGPT Plus and is an overall great choice.

On a more technical level, LLama3 as a LLM is good enough to compete against GPT-4 in different scenarios, only losing in terms of token context capabilities and Retrieval Augmented Generations (basically pulling information from a specific dataset provided by the user). This may be important for tech-savvy users, but may not be a big deal for the everyday person.

If you primarily use ChatGPT to generate images with Dall-E, you may want to consider canceling your subscription, as Llama-3's image and animation generation capabilities are comparable. However, if you also require support for long prompts, Llama-3 may not be the best choice for you and you may want to consider sticking with ChatGPT-Plus.

Occasional users may find that Llama-3 meets their needs without requiring a paid membership.

For tasks requiring heavy internet research, ChatGPT Plus or Perplexity may be more suitable.

Finally, if your focus is on coding, Llama-3 could be a good alternative, although there are other specialized tools available. The fact that Llama-3 is free is a significant advantage.

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices