Apr 24, 2026
JUST DROPPED
DeepSeek V4 Pro & V4 Flash
DeepSeek · 1.6T MoE / 49B active · MIT · 1M ctx
The release of the year. V4-Pro (1.6T total / 49B active) and V4-Flash (284B / 13B active) ship under MIT with a 1M-token context, and need just ~10% of V3's KV cache. Open weights are already on Hugging Face.
Ollama
llama.cpp
vLLM
SGLang
Hugging Face →
Apr 16, 2026
NEW
Qwen 3.6 35B-A3B
Alibaba · MoE · 35B total / 3B active · Apache 2.0
The Qwen team's freshest MoE — A3B activations make it sip VRAM while topping the open leaderboards on tool-use and multilingual tasks. Drops straight into Ollama and vLLM.
Ollama
vLLM
SGLang
MLX
Hugging Face →
Mar 2026
Gemma 4
Google · 2B · 9B · 27B · multimodal · open
The Gemma family's biggest leap yet — frontier-level reasoning at every size, native vision, and 27B is the new sweet spot for a single 24 GB consumer GPU.
Ollama
llama.cpp
MLX
Hugging Face →
Feb 2026
Llama 4 Scout & Maverick
Meta · MoE · natively multimodal · 17B–400B+
Meta's first natively multimodal MoE. Scout (17B active) fits a single H100; Maverick is the workstation tier and still leads open-weight MMLU at 85.5%.
Ollama
llama.cpp
vLLM
MLX
Hugging Face →
Feb 2026
GLM-5.1
Zhipu AI · open weights · agentic-tuned
Zhipu's flagship is now an open-weight contender. GLM-5.1 is purpose-tuned for agent loops, with strong function-calling and one of the cleanest tool-use schemas in the open ecosystem.
Ollama
vLLM
SGLang
Hugging Face →
Dec 2025
Mistral Large 3
Mistral AI · 675B / 41B active · Apache 2.0 · multimodal
Mistral's flagship returns to fully permissive Apache 2.0 weights — multimodal, 80+ languages, and the strongest open European-language model on the market.
Ollama
llama.cpp
vLLM
Hugging Face →
Nov 2025
FLUX.2 [dev]
Black Forest Labs · diffusion · production-grade
The successor to FLUX.1 takes open image gen from "experimental" to true production-grade — sharper prompt adherence, real text rendering, and the new go-to checkpoint for ComfyUI, Forge and SwarmUI.
ComfyUI
Forge
SwarmUI
Draw Things
Hugging Face →
Aug 2025
gpt-oss-120b & Hermes 4
OpenAI · Nous Research · Apache 2.0
OpenAI's first open-weight release (120B MoE / 5.1B active, runs on a single 80 GB H100) and Nous Research's Hermes 4 — the first local model trained mostly on real agent traces — are now baseline picks for any local agent stack.
Ollama
vLLM
llama.cpp
Hugging Face →