Every modality, every size class, every license — mapped to the open-source loader that actually runs it. Bookmark this page: it's the only one you'll need when deciding what to download next.
Speech / Audio
Music
Video
The bread-and-butter category. These are instruction-tuned language models you'd use for writing, brainstorming, Q&A, summarisation and as the reasoning core of most agents.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| DeepSeek V4 Pro / V4 FlashDeepSeek · MoE · 49B / 13B active · 1M ctx | 284B–1.6T | MIT | Apr 2026 — open frontier-class quality, 1M context, ~10% of V3's KV cache. |
Ollama
llama.cpp
vLLM
Workstation
|
| Llama 4 Scout & MaverickMeta · MoE · natively multimodal | 17B–400B+ | Llama 4 Community License | Maverick still tops MMLU among open models (85.5%). Scout fits one H100. |
Ollama
Open WebUI
llama.cpp
vLLM
MLX
|
| Qwen 3.5 / Qwen 3.6Alibaba · 27B · 35B-A3B · 122B-A10B · 397B-A17B | 3B–397B | Apache 2.0 (most sizes) | SOTA tool-use & JSON; A-series MoEs activate tiny slices for speed. |
Ollama
Open WebUI
llama.cpp
vLLM
MLX
|
| Gemma 4Google · 2B · 9B · 27B · multimodal | 2B–27B | Gemma license (permissive) | Frontier-level quality at every size, native vision, friendly on consumer GPUs. |
Ollama
Open WebUI
llama.cpp
MLX
|
| Mistral Large 3 / Small 4Mistral AI · 24B · 675B / 41B active | 24B–675B | Apache 2.0 | Multimodal, 80+ languages, very strong function-calling. |
Ollama
Open WebUI
llama.cpp
vLLM
|
| GLM-5.1Zhipu AI · open · agentic-tuned | 9B–355B | Open MIT-style | Cleanest open tool-use schema; great for long agent loops. |
Ollama
vLLM
SGLang
|
| gpt-oss-120b / 20bOpenAI · MoE · 5.1B active | 20B–120B | Apache 2.0 | OpenAI's first open release. 120B fits a single 80 GB H100; 20B runs on an M-series Mac. |
Ollama
vLLM
llama.cpp
MLX
|
| Hermes 4 70B / 405BNous Research · agent-trace tuned | 70B–405B | Llama license | Trained mostly on real agent traces — best-in-class tool-calling & long-horizon work. |
Ollama
Open WebUI
llama.cpp
vLLM
|
| Llama 3.x (3.1 / 3.2 / 3.3)Meta · 1B · 3B · 8B · 70B | 1B–70B | Llama license | Still the most-used local default in 2026 — proven, well-supported. |
Ollama
Open WebUI
llama.cpp
vLLM
MLX
|
| Phi-4 / Phi-4-miniMicrosoft · 3.8B · 14B | 3.8B–14B | MIT | Tiny models that behave like big ones. Great on CPU and edge devices. |
Ollama
Open WebUI
llama.cpp
|
Models fine-tuned on source code. Use them as local Copilot autocompletes, repo-aware refactorers, or the backend of tools like Continue, Aider and Open Interpreter.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Qwen3 CoderAlibaba · 1.5B → 480B-A35B MoE | 1.5B–480B | Apache 2.0 | 2026 SOTA open code model. 480B-A35B MoE matches closed Copilot quality on real PRs. | OllamaOpen WebUIllama.cppvLLM |
| DeepSeek-Coder V3 / V4-CodeDeepSeek · MoE · open weights | 16B–671B | MIT | Massive context, best-in-class at large-repo reasoning & refactors. | Ollamallama.cppvLLM |
| CodestralMistral AI · 22B | 22B | MNPL (non-prod free) | Multi-language code completion & generation. | OllamaOpen WebUIllama.cpp |
| StarCoder 2BigCode · 3B · 7B · 15B | 3B–15B | BigCode OpenRAIL-M | FIM autocomplete, 600+ languages. | Ollamallama.cppvLLM |
| CodeLlamaMeta · 7B · 13B · 34B · 70B | 7B–70B | Llama license | Mature, well-supported, lots of specialist variants. | OllamaOpen WebUIllama.cpp |
| Granite CodeIBM · 3B · 8B · 20B · 34B | 3B–34B | Apache 2.0 | Enterprise-friendly license, solid quality. | Ollamallama.cppvLLM |
| StableCode / Replit CodeStability · Replit | 1.3B–3B | Permissive | Tiny autocomplete models — perfect for laptops. | Ollamallama.cpp |
A newer class of "think-before-you-speak" models that produce an internal chain-of-thought before an answer. They're slower but crush math, logic, planning and agent decisions.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| DeepSeek V4 (reasoning)DeepSeek · 1.6T MoE / 49B active | 284B–1.6T | MIT | 2026 SOTA — V4 carries the R1 reasoning lineage forward at frontier scale, with a 1M-token "think" budget. | Ollamallama.cppvLLM |
| DeepSeek-R1 / R1-DistillDeepSeek · 1.5B → 671B | 1.5B–671B | MIT | The breakthrough that started the open reasoning wave — distills down to 7B that still beats GPT-4o on math. | OllamaOpen WebUIllama.cppvLLM |
| QwQ-Max / Qwen3-ReasoningAlibaba · 32B · 72B | 32B–72B | Apache 2.0 | Deep step-by-step analysis, excellent backbone for agent loops. | Ollamallama.cppvLLM |
| gpt-oss-120b (reasoning mode)OpenAI · 120B MoE / 5.1B active | 120B | Apache 2.0 | OpenAI's open release ships with a built-in reasoning mode — runs on a single 80 GB H100. | OllamavLLMllama.cpp |
| OpenThinker · DeepThinker · Marco-o1Community reasoning tunes | 7B–32B | Apache 2.0 | Research-grade thinkers you can actually run on a consumer GPU. | Ollamallama.cpp |
| Mathstral / DeepSeek-MathSpecialist math models | 7B | Apache / Custom | Symbolic maths, proofs, formal reasoning. | Ollamallama.cpp |
Models that accept images (and sometimes video) alongside text. Essential for computer-use agents like OpenClaw, document understanding, OCR and visual Q&A.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Llama 4 (native VL) · Llama 3.2 VisionMeta · MoE · 11B · 90B · 400B+ | 11B–400B+ | Llama 4 / Llama license | Llama 4 is natively multimodal end-to-end; 3.2 Vision is the proven workhorse. | OllamaOpen WebUIllama.cppvLLM |
| Qwen3-VL / Qwen2.5-VLAlibaba · 2B · 7B · 72B · 235B | 2B–235B | Apache 2.0 / Qwen license | UI screenshots, OCR, charts, video frames. Top pick for computer-use agents. | OllamaOpen WebUIvLLM |
| Gemma 4 (vision)Google · 9B · 27B | 9B–27B | Gemma license | Compact, native vision, very fast on a single 24 GB GPU. | Ollamallama.cppMLX |
| LLaVA / LLaVA-NeXTCommunity · 7B · 13B · 34B | 7B–34B | Apache 2.0 (weights vary) | The classic open vision-language model family. | OllamaOpen WebUIllama.cpp |
| PixtralMistral AI · 12B | 12B | Apache 2.0 | High-quality image reasoning, strong at documents. | OllamavLLM |
| InternVL 3Shanghai AI Lab · 1B → 108B | 1B–108B | MIT (weights) | 2026 update — leads most open vision benchmarks & long-video understanding. | vLLMllama.cpp |
| MiniCPM-V / Florence-2Edge vision models | 0.2B–8B | Apache 2.0 | Tiny vision models for phones and IoT. | Ollamallama.cpp |
Diffusion and flow-matching models that create images from text prompts. Completely different plumbing from LLMs — different loaders, different file formats.
.safetensors checkpoints plus optional LoRAs, VAEs and ControlNets.
| Model | VRAM needed | License | Best at | Runs on |
|---|---|---|---|---|
| FLUX.2 [dev] / [pro]Black Forest Labs · production-grade | 14–28 GB | FLUX.2 community / commercial license | 2026 SOTA — sharper prompt adherence, real text rendering, near-photoreal quality. | ComfyUIForgeSwarmUIDraw Things |
| FLUX.1 [dev] / [schnell]Black Forest Labs · 12B | 12–24 GB | FLUX.1 non-commercial / Apache | Still the most-downloaded open checkpoint of 2025 — huge LoRA library. | ComfyUIForgeSwarmUIDraw Things |
| Stable Diffusion 3.5Stability AI · Medium · Large | 8–16 GB | Stability Community License | Best supported ecosystem, huge LoRA catalog. | ComfyUIA1111InvokeAIForge |
| SDXL / SDXL TurboStability AI · 3.5B | 6–12 GB | OpenRAIL++ | The workhorse. Huge community, countless fine-tunes. | ComfyUIA1111InvokeAIDraw Things |
| SD 1.5 & fine-tunesRealistic Vision, DreamShaper… | 4–6 GB | OpenRAIL / CreativeML | Still unbeaten for specific artistic styles via fine-tunes. | ComfyUIA1111Forge |
| Playground v3 / KolorsCommunity flagships | 10–16 GB | Custom / Apache | Distinctive aesthetics, great for commercial art. | ComfyUI |
| ControlNet · IP-Adapter · LoRAsNot models — add-ons | ~100 MB each | Mostly permissive | Pose control, style transfer, subject consistency, identity. | ComfyUIA1111InvokeAI |

Models that turn audio into text (STT / ASR) and text back into natural-sounding voices (TTS). Local speech is now at or above cloud quality.
| Model | Type | License | Best at | Runs on |
|---|---|---|---|---|
| Whisper v3 · distil-whisperOpenAI · tiny → large-v3 | STT | MIT | 99-language transcription, the de-facto standard. | whisper.cppfaster-whisperLocalAI |
| Parakeet / CanaryNVIDIA NeMo | STT | CC-BY-4.0 | Ultra-fast English transcription. | NeMofaster-whisper |
| Piperrhasspy · dozens of voices | TTS | MIT | Lightning-fast, CPU-friendly TTS. Great for assistants. | PiperLocalAIHome Assistant |
| Coqui XTTS v2Coqui · voice cloning | TTS | CPML (non-commercial) | Clones any voice from 6 seconds of audio. | Coqui TTSLocalAI |
| F5-TTS / StyleTTS 2Natural prosody | TTS | MIT / CC-BY-NC | Extremely natural, expressive synthesis. | Native PythonComfyUI nodes |
| KokoroCompact all-in-one TTS | TTS | Apache 2.0 | Tiny, fast, surprisingly good quality. | Native PythonLocalAI |

From background loops to full songs with vocals — local music models are maturing fast.
| Model | Type | License | Best at | Runs on |
|---|---|---|---|---|
| Stable Audio OpenStability AI | Music / SFX | Stability Community | 47-second clips, sound effects, loops. | ComfyUINative Python |
| MusicGen / AudioGenMeta | Music / SFX | CC-BY-NC | Text-to-music, melody-conditioned generation. | AudioCraftComfyUI |
| YuE / OpenMusicOpen song generators | Full songs | Apache 2.0 | Multi-minute songs with vocals and structure. | Native Python |
| Barksuno-ai | Voice / SFX | MIT | Expressive voice acting, laughs, music snippets. | Native PythonComfyUI |

The newest — and most hardware-hungry — local modality. Expect seconds of video in exchange for minutes of GPU time.
| Model | VRAM | License | Best at | Runs on |
|---|---|---|---|---|
| HunyuanVideo 1.5Tencent · 13B+ | 12–80 GB | Custom (open) | 2026 update — quantized GGUFs now run on consumer 24 GB GPUs with cinematic quality. | ComfyUI |
| Wan 2.5 / CogVideoX-2Community video models | 12–24 GB | Apache 2.0 | Runs on a single 4090, great prompt adherence and motion consistency. | ComfyUI |
| LTX-VideoLightricks | 8–16 GB | OpenRAIL | Fast, near-real-time short video generation. | ComfyUI |
| Mochi 1 · AnimateDiffImage-to-video, animation | 8–24 GB | Apache 2.0 | Animating stills, consistent motion, loops. | ComfyUIA1111 ext. |
Not chat models — they turn text into vectors for semantic search, RAG and memory. Essential for any agent that needs to "remember".
/v1/embeddings endpoint. LocalAI and AnythingLLM wire them in for you.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| nomic-embed-textNomic · 137M | ~275 MB | Apache 2.0 | Great default, 8k context, multilingual variant. | OllamaOpen WebUILocalAI |
| BGE-M3 / BGE-largeBAAI · multi-function | ~1.3 GB | MIT | Top of MTEB; dense + sparse + ColBERT in one model. | Ollamallama.cppLocalAI |
| mxbai-embed-largeMixedbread | ~670 MB | Apache 2.0 | High quality for its size, great for English RAG. | Ollamallama.cpp |
| Jina Embeddings v3Jina AI | ~1.1 GB | CC-BY-NC | Long-context, task-LoRA-switchable. | Native PythonOllama (Q) |
| bge-reranker-v2-m3BAAI · cross-encoder | ~570 MB | Apache 2.0 | Reranker — huge quality boost on top of any embedder. | LocalAINative Python |
Sub-4B-parameter models that run well on CPUs, phones, Raspberry Pis, and modest laptops. Surprisingly capable — and perfect for always-on assistants.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Gemma 4 2BGoogle · vision-capable | 2B | Gemma license | 2026 — punches above 9B-class quality on a phone-sized footprint. | Ollamallama.cpp |
| Llama 3.2 1B / 3BMeta | 1B–3B | Llama license | Best all-round tiny chat model in the wild. | OllamaOpen WebUIllama.cppMLC-LLM |
| Qwen 3 0.5B / 1.5B / 3BAlibaba | 0.5B–3B | Apache 2.0 | Astonishing quality per parameter, full tool-use support. | OllamaOpen WebUIllama.cpp |
| Phi-4-miniMicrosoft · 3.8B | 3.8B | MIT | Reasoning in a small package — runs comfortably on CPU. | Ollamallama.cpp |
| SmolLM 2Hugging Face · 135M · 360M · 1.7B | 0.1B–1.7B | Apache 2.0 | Microscopic assistants, runs on anything. | Ollamallama.cppExecuTorch |
| TinyLlama / MobileLLMMobile-first | 125M–1.1B | Apache 2.0 | On-device draft-models & smart-reply. | llama.cppMLC-LLMExecuTorch |
A rough rule of thumb for Q4-quantized GGUF models. Real-world numbers vary by quantization, context length and loader.
Up to ~3B chat (Llama 3.2 3B, Qwen 3 3B, Gemma 4 2B), tiny code models, embeddings, Whisper small, Piper TTS, SD 1.5 on CPU (slow).
9B chat at good speed (Gemma 4 9B, Qwen 3.5 7B), 7B code, 7B vision, FLUX.1 / SDXL image gen, full Whisper large-v3, light ComfyUI video.
27B–70B chat at interactive speeds, FLUX.2, HunyuanVideo 1.5 GGUF, DeepSeek-R1-Distill 32B, gpt-oss-20b, full agent stacks.
DeepSeek V4 Pro 1.6T, Llama 4 Maverick 400B+, gpt-oss-120b, Mistral Large 3 675B, HunyuanVideo full — anything the open ecosystem ships in 2026.
Bookmark this page and check the comparison table for the right loader to pair with each model.