Every modality, every size class, every license — mapped to the open-source loader that actually runs it. Bookmark this page: it's the only one you'll need when deciding what to download next.
Speech / Audio
Music
Video
The bread-and-butter category. These are instruction-tuned language models you'd use for writing, brainstorming, Q&A, summarisation and as the reasoning core of most agents.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Llama 3.1 / 3.2 / 3.3Meta · 1B · 3B · 8B · 70B | 1B–70B | Llama license (free for most use) | The default local assistant. Great balance of quality and speed. |
Ollama
Open WebUI
llama.cpp
vLLM
MLX
|
| Qwen 2.5 / Qwen 3Alibaba · 0.5B → 72B | 0.5B–72B | Apache 2.0 (most sizes) | Strong multilingual, very strong at tool-use & JSON output. |
Ollama
Open WebUI
llama.cpp
vLLM
MLX
|
| Mistral / MixtralMistral AI · 7B · 8×7B · 8×22B | 7B–141B | Apache 2.0 | MoE efficiency — fast responses with large-model quality. |
Ollama
Open WebUI
llama.cpp
vLLM
|
| Gemma 2 / Gemma 3Google · 2B · 9B · 27B | 2B–27B | Gemma license (permissive) | Punches above its weight, especially the 9B for consumer GPUs. |
Ollama
Open WebUI
llama.cpp
MLX
|
| Phi-3.5 / Phi-4Microsoft · 3.8B · 14B | 3.8B–14B | MIT | Tiny models that behave like big ones. Great on CPU. |
Ollama
Open WebUI
llama.cpp
|
| Hermes 3 / Nous HermesNous Research · 8B · 70B · 405B | 8B–405B | Llama license | Best-in-class tool-calling, structured output & agent work. |
Ollama
Open WebUI
llama.cpp
vLLM
|
| Command R / R+Cohere · 32B · 104B | 32B–104B | CC BY-NC (non-commercial) | Long-context RAG & citations. |
Ollama
llama.cpp
vLLM
|
| Yi 1.5 · GLM-4 · DeepSeek V3Frontier open-weight releases | 6B–671B | Mixed (mostly permissive) | For users with serious hardware — approaching GPT-4-class quality. |
llama.cpp
vLLM
Workstation class
|
Models fine-tuned on source code. Use them as local Copilot autocompletes, repo-aware refactorers, or the backend of tools like Continue, Aider and Open Interpreter.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Qwen 2.5 CoderAlibaba · 0.5B → 32B | 0.5B–32B | Apache 2.0 | Current SOTA open code model. 32B rivals closed Copilot quality. | OllamaOpen WebUIllama.cppvLLM |
| DeepSeek Coder V2DeepSeek · 16B · 236B MoE | 16B–236B | Custom (permissive) | Huge context, excellent at large-repo reasoning. | Ollamallama.cppvLLM |
| CodestralMistral AI · 22B | 22B | MNPL (non-prod free) | Multi-language code completion & generation. | OllamaOpen WebUIllama.cpp |
| StarCoder 2BigCode · 3B · 7B · 15B | 3B–15B | BigCode OpenRAIL-M | FIM autocomplete, 600+ languages. | Ollamallama.cppvLLM |
| CodeLlamaMeta · 7B · 13B · 34B · 70B | 7B–70B | Llama license | Mature, well-supported, lots of specialist variants. | OllamaOpen WebUIllama.cpp |
| Granite CodeIBM · 3B · 8B · 20B · 34B | 3B–34B | Apache 2.0 | Enterprise-friendly license, solid quality. | Ollamallama.cppvLLM |
| StableCode / Replit CodeStability · Replit | 1.3B–3B | Permissive | Tiny autocomplete models — perfect for laptops. | Ollamallama.cpp |
A newer class of "think-before-you-speak" models that produce an internal chain-of-thought before an answer. They're slower but crush math, logic, planning and agent decisions.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| DeepSeek-R1 / R1-DistillDeepSeek · 1.5B → 671B | 1.5B–671B | MIT | Open reasoning breakthrough — distills down to 7B that beats GPT-4o on math. | OllamaOpen WebUIllama.cppvLLM |
| QwQ / Qwen-ReasoningAlibaba · 32B | 32B | Apache 2.0 | Deep step-by-step analysis, excellent for agents. | Ollamallama.cppvLLM |
| Marco-o1 / OpenThinkerCommunity reasoning tunes | 7B–32B | Apache 2.0 | Research-grade thinkers you can actually run on a consumer GPU. | Ollamallama.cpp |
| Mathstral / DeepSeek-MathSpecialist math models | 7B | Apache / Custom | Symbolic maths, proofs, formal reasoning. | Ollamallama.cpp |
Models that accept images (and sometimes video) alongside text. Essential for computer-use agents like OpenClaw, document understanding, OCR and visual Q&A.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Llama 3.2 VisionMeta · 11B · 90B | 11B–90B | Llama license | General-purpose image understanding & VQA. | OllamaOpen WebUIllama.cppvLLM |
| Qwen2-VL / Qwen2.5-VLAlibaba · 2B · 7B · 72B | 2B–72B | Apache 2.0 / Qwen license | UI screenshots, OCR, charts, video frames. Top pick for agents. | OllamaOpen WebUIvLLM |
| LLaVA / LLaVA-NeXTCommunity · 7B · 13B · 34B | 7B–34B | Apache 2.0 (weights vary) | The classic open vision-language model family. | OllamaOpen WebUIllama.cpp |
| PixtralMistral AI · 12B | 12B | Apache 2.0 | High-quality image reasoning, strong at documents. | OllamavLLM |
| InternVL 2.5Shanghai AI Lab · 1B → 78B | 1B–78B | MIT (weights) | Current open SOTA on most vision benchmarks. | vLLMllama.cpp |
| MiniCPM-V / Florence-2Edge vision models | 0.2B–8B | Apache 2.0 | Tiny vision models for phones and IoT. | Ollamallama.cpp |
Diffusion and flow-matching models that create images from text prompts. Completely different plumbing from LLMs — different loaders, different file formats.
.safetensors checkpoints plus optional LoRAs, VAEs and ControlNets.
| Model | VRAM needed | License | Best at | Runs on |
|---|---|---|---|---|
| FLUX.1 [dev] / [schnell]Black Forest Labs · 12B | 12–24 GB | FLUX.1 non-commercial / Apache | Current open SOTA. Stunning prompt adherence & text rendering. | ComfyUIForgeSwarmUIDraw Things |
| Stable Diffusion 3.5Stability AI · Medium · Large | 8–16 GB | Stability Community License | Best supported ecosystem, huge LoRA catalog. | ComfyUIA1111InvokeAIForge |
| SDXL / SDXL TurboStability AI · 3.5B | 6–12 GB | OpenRAIL++ | The workhorse. Huge community, countless fine-tunes. | ComfyUIA1111InvokeAIDraw Things |
| SD 1.5 & fine-tunesRealistic Vision, DreamShaper… | 4–6 GB | OpenRAIL / CreativeML | Still unbeaten for specific artistic styles via fine-tunes. | ComfyUIA1111Forge |
| Playground v3 / KolorsCommunity flagships | 10–16 GB | Custom / Apache | Distinctive aesthetics, great for commercial art. | ComfyUI |
| ControlNet · IP-Adapter · LoRAsNot models — add-ons | ~100 MB each | Mostly permissive | Pose control, style transfer, subject consistency, identity. | ComfyUIA1111InvokeAI |

Models that turn audio into text (STT / ASR) and text back into natural-sounding voices (TTS). Local speech is now at or above cloud quality.
| Model | Type | License | Best at | Runs on |
|---|---|---|---|---|
| Whisper v3 · distil-whisperOpenAI · tiny → large-v3 | STT | MIT | 99-language transcription, the de-facto standard. | whisper.cppfaster-whisperLocalAI |
| Parakeet / CanaryNVIDIA NeMo | STT | CC-BY-4.0 | Ultra-fast English transcription. | NeMofaster-whisper |
| Piperrhasspy · dozens of voices | TTS | MIT | Lightning-fast, CPU-friendly TTS. Great for assistants. | PiperLocalAIHome Assistant |
| Coqui XTTS v2Coqui · voice cloning | TTS | CPML (non-commercial) | Clones any voice from 6 seconds of audio. | Coqui TTSLocalAI |
| F5-TTS / StyleTTS 2Natural prosody | TTS | MIT / CC-BY-NC | Extremely natural, expressive synthesis. | Native PythonComfyUI nodes |
| KokoroCompact all-in-one TTS | TTS | Apache 2.0 | Tiny, fast, surprisingly good quality. | Native PythonLocalAI |

From background loops to full songs with vocals — local music models are maturing fast.
| Model | Type | License | Best at | Runs on |
|---|---|---|---|---|
| Stable Audio OpenStability AI | Music / SFX | Stability Community | 47-second clips, sound effects, loops. | ComfyUINative Python |
| MusicGen / AudioGenMeta | Music / SFX | CC-BY-NC | Text-to-music, melody-conditioned generation. | AudioCraftComfyUI |
| YuE / OpenMusicOpen song generators | Full songs | Apache 2.0 | Multi-minute songs with vocals and structure. | Native Python |
| Barksuno-ai | Voice / SFX | MIT | Expressive voice acting, laughs, music snippets. | Native PythonComfyUI |

The newest — and most hardware-hungry — local modality. Expect seconds of video in exchange for minutes of GPU time.
| Model | VRAM | License | Best at | Runs on |
|---|---|---|---|---|
| HunyuanVideoTencent · 13B | 24–80 GB | Custom (open) | Current open SOTA — highly cinematic results. | ComfyUI |
| Wan 2.1 / CogVideoXCommunity video models | 12–24 GB | Apache 2.0 | Runs on a single 4090, great prompt adherence. | ComfyUI |
| LTX-VideoLightricks | 8–16 GB | OpenRAIL | Fast, near-real-time short video generation. | ComfyUI |
| Mochi 1 · AnimateDiffImage-to-video, animation | 8–24 GB | Apache 2.0 | Animating stills, consistent motion, loops. | ComfyUIA1111 ext. |
Not chat models — they turn text into vectors for semantic search, RAG and memory. Essential for any agent that needs to "remember".
/v1/embeddings endpoint. LocalAI and AnythingLLM wire them in for you.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| nomic-embed-textNomic · 137M | ~275 MB | Apache 2.0 | Great default, 8k context, multilingual variant. | OllamaOpen WebUILocalAI |
| BGE-M3 / BGE-largeBAAI · multi-function | ~1.3 GB | MIT | Top of MTEB; dense + sparse + ColBERT in one model. | Ollamallama.cppLocalAI |
| mxbai-embed-largeMixedbread | ~670 MB | Apache 2.0 | High quality for its size, great for English RAG. | Ollamallama.cpp |
| Jina Embeddings v3Jina AI | ~1.1 GB | CC-BY-NC | Long-context, task-LoRA-switchable. | Native PythonOllama (Q) |
| bge-reranker-v2-m3BAAI · cross-encoder | ~570 MB | Apache 2.0 | Reranker — huge quality boost on top of any embedder. | LocalAINative Python |
Sub-4B-parameter models that run well on CPUs, phones, Raspberry Pis, and modest laptops. Surprisingly capable — and perfect for always-on assistants.
| Model | Size | License | Best at | Runs on |
|---|---|---|---|---|
| Llama 3.2 1B / 3BMeta | 1B–3B | Llama license | Best all-round tiny chat model. | OllamaOpen WebUIllama.cppMLC-LLM |
| Qwen 2.5 0.5B / 1.5B / 3BAlibaba | 0.5B–3B | Apache 2.0 | Astonishing quality per parameter. | OllamaOpen WebUIllama.cpp |
| Phi-3.5-miniMicrosoft · 3.8B | 3.8B | MIT | Reasoning in a small package. | Ollamallama.cpp |
| Gemma 2 2BGoogle | 2B | Gemma license | Punchy on CPU, strong safety tuning. | Ollamallama.cpp |
| SmolLM 2Hugging Face · 135M · 360M · 1.7B | 0.1B–1.7B | Apache 2.0 | Microscopic assistants, runs on anything. | Ollamallama.cppExecuTorch |
| TinyLlama / MobileLLMMobile-first | 125M–1.1B | Apache 2.0 | On-device draft-models & smart-reply. | llama.cppMLC-LLMExecuTorch |
A rough rule of thumb for Q4-quantized GGUF models. Real-world numbers vary by quantization, context length and loader.
Up to ~3B chat (Llama 3.2 3B, Qwen 2.5 3B), tiny code models, embeddings, Whisper small, Piper TTS, SD 1.5 on CPU (slow).
8B chat at good speed, 7B code, 7B vision, SDXL image gen, full Whisper large-v3, light ComfyUI workflows.
30B–70B chat at interactive speeds, FLUX.1, LTX-Video, DeepSeek-R1 Distill 32B, full agent stacks.
Llama 3.1 405B, DeepSeek V3/R1 full, HunyuanVideo, vLLM serving a whole team — anything the open ecosystem ships.
Bookmark this page and check the comparison table for the right loader to pair with each model.