Meta Introduced the first artificial intelligence (ai) models in the llama 4 family on saturday. The menlo park-based tech giant released two models-lLAma 4 scout and llam 4 Maverick-With Native Multimodal Capabilites to the Open Community. The company says these are the first open models Built with mixture-of-experts (Moe) Architecture. Compared to the Predcessor, these come with Higher Context Windows and Better Power Efficiency. AlongSide, Meta also previewed llama 4 behemoth, the largest ai model in the family unveiled so far.
In a blog postThe tech giant detailed its new ai models. Just like the previous llam models, the llama 4 scout and lLAma 4 Maverick are open-Source ai models and can be downloaded via its hugging face Listing Or the dedicated llama website. Starting today, users can also experience the llama 4 ai models in Whatsapp, Messenger, Instagram Direct, and on the Meta.ai Website.
The lLAma 4 scout is a 17 billion active parameter model with 16 experts, whereas the maverick model come with 17 billion active parameters and 128 experts. Scout is said to be able to run on a single nvidia h100 gpu. Additional, The Company Claimed that the previewed llam 4 behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on Several Benchmarks. Meta said the behemoth model, with 288 billion active parameters and 16 experts, was not related as it is still being trained.
The moe architecture in llama 4 ai models
Photo Credit: Meta
Coming to the architecture, the Llama 4 models are built on an moe architecture. The Moe Architecture Activates only a fraction of the total parameters based on the requirement of the initial prompt, which makes it More Compute Efficient for Training and Infection. In the pre-training phase, meta also used new techniques such as early fussion to integrate text and vision tokens to simultaneously, and metap to setup to set Critical Model Hyper-Parameters and Initialization.
For post-training, meta chose to start the process with lightweight supervised fin-tuning (SFT), Followed by Online Reinforcement Learning (RL) and Lightweight Dearect Preference optimization optimization (DPO). The Sequence was chown to not over-Constrain the model. The Researchers also performed sft on only 50 percent of the “harder” dataset.
Based on Internal Testing, The Company Claimed That The Maverick Model Outperforms Gemini 2.0 Flash, Deepsek V3.1, and GPT-4O on the MMMU (Image Reasoning), Chartqa (Image UNDERSTASN) (Reasoning and Knowledge), and MTOB (Long Context) Benchmarks.
On the other hand, the scout model is said to outperform gemma 3, Mistral 3.1, and gemini 2.0 on the mmmu, chartaqa, mmlu (Reasoning and Knowledge), GPQA Diamond, and MTOB BENCHMARKS.
Meta has also taken steps to make the ai models safer in bot the pre-training and post-training processes. In pre-training, the resultars used data filtering methods to ensure harmful data was not added to its knowledge base. In post-training, the researchers added open-source tools such as llama guard and prompt guard to protect the model from external Attacks. Additional, The Researchers have also also stress-tested the models internal and have allowed red-teaming of the lLAma 4 scout and maverick models.
Notable, the models are available to the open communication with a permissive Llama 4 License. It allows bot academic and commercial usage of the models, however, meta no longer allows companies with more than 700 million monthly active users to access its models.
6 Maverick (T) AI (T) Artificial Intelligence (T) AI Model (T) LLM