100% offline · 100% open source · zero cloud dependencies

Run powerful AI entirely on your own machine.

runoffline.ai is the international hub for everyone exploring local LLMs — 100% open-source runtimes like Ollama, Open WebUI and Jan, and a new wave of personal agent platforms like OpenClaw, Hermes Agent, ZeroClaw and OpenCode. No clouds. No telemetry. No proprietary lock-in.

12+
Runtimes covered
10
Personal agent platforms
0
Data leaves your device
Models you can run
Just released

Fresh open-weight drops

New models ship faster than ever. Here's what just hit Hugging Face and can run on your own hardware — updated as releases happen.

Apr 24, 2026 JUST DROPPED

DeepSeek V4 Pro & V4 Flash

DeepSeek · 1.6T MoE / 49B active · MIT · 1M ctx

The release of the year. V4-Pro (1.6T total / 49B active) and V4-Flash (284B / 13B active) ship under MIT with a 1M-token context, and need just ~10% of V3's KV cache. Open weights are already on Hugging Face.

Ollama llama.cpp vLLM SGLang
Hugging Face →
Apr 16, 2026 NEW

Qwen 3.6 35B-A3B

Alibaba · MoE · 35B total / 3B active · Apache 2.0

The Qwen team's freshest MoE — A3B activations make it sip VRAM while topping the open leaderboards on tool-use and multilingual tasks. Drops straight into Ollama and vLLM.

Ollama vLLM SGLang MLX
Hugging Face →
Mar 2026

Gemma 4

Google · 2B · 9B · 27B · multimodal · open

The Gemma family's biggest leap yet — frontier-level reasoning at every size, native vision, and 27B is the new sweet spot for a single 24 GB consumer GPU.

Ollama llama.cpp MLX
Hugging Face →
Feb 2026

Llama 4 Scout & Maverick

Meta · MoE · natively multimodal · 17B–400B+

Meta's first natively multimodal MoE. Scout (17B active) fits a single H100; Maverick is the workstation tier and still leads open-weight MMLU at 85.5%.

Ollama llama.cpp vLLM MLX
Hugging Face →
Feb 2026

GLM-5.1

Zhipu AI · open weights · agentic-tuned

Zhipu's flagship is now an open-weight contender. GLM-5.1 is purpose-tuned for agent loops, with strong function-calling and one of the cleanest tool-use schemas in the open ecosystem.

Ollama vLLM SGLang
Hugging Face →
Dec 2025

Mistral Large 3

Mistral AI · 675B / 41B active · Apache 2.0 · multimodal

Mistral's flagship returns to fully permissive Apache 2.0 weights — multimodal, 80+ languages, and the strongest open European-language model on the market.

Ollama llama.cpp vLLM
Hugging Face →
Nov 2025

FLUX.2 [dev]

Black Forest Labs · diffusion · production-grade

The successor to FLUX.1 takes open image gen from "experimental" to true production-grade — sharper prompt adherence, real text rendering, and the new go-to checkpoint for ComfyUI, Forge and SwarmUI.

ComfyUI Forge SwarmUI Draw Things
Hugging Face →
Aug 2025

gpt-oss-120b & Hermes 4

OpenAI · Nous Research · Apache 2.0

OpenAI's first open-weight release (120B MoE / 5.1B active, runs on a single 80 GB H100) and Nous Research's Hermes 4 — the first local model trained mostly on real agent traces — are now baseline picks for any local agent stack.

Ollama vLLM llama.cpp
Hugging Face →
Why run AI locally

Own the model. Own the data. Own the outcome.

Local AI flips the economics and ethics of intelligence. Your prompts, your files, your agents, and your reasoning — all stay on your machine, under your control.

🔒

Private by default

Every token is computed on your CPU/GPU. No prompts uploaded, no logs scraped, no vendor training on your workflow.

✈️

Works offline

A plane seat, a remote cabin, a classified network — your models keep working exactly the same, with zero latency to a data center.

💸

Zero per-token cost

Run millions of tokens a day without a bill. The only cost is the electricity you were already paying for.

🧠

Full customization

Swap models, quantizations, system prompts, tools, and memory. You aren't stuck in someone else's sandbox.

⚡️

Instant responses

Apple Silicon, NVIDIA, AMD and even modest CPUs can now stream tokens fast enough for real agentic work.

🛠️

Hackable ecosystem

From GGUF models to MCP tools, the local stack is open, inspectable, and composable — the way software should be.

Local LLM runtimes

The engines that bring models to your laptop

Whether you want a one-line install or a full developer-grade toolkit, there's a runtime made for your workflow.

Ollama

CLI + API · macOS · Linux · Windows

The easiest way to pull, run and serve open models. One command, hundreds of models, instant OpenAI-compatible API.

Beginner friendlyOpen source
Details on runoffline →

Open WebUI

Self-hosted web UI · BSD licensed

A ChatGPT-class web interface for your local models — multi-user, RAG, plugins, all open source.

Open sourceSelf-hosted
Details on runoffline →

Jan

Open-source ChatGPT alternative

A fully-open, privacy-first desktop chat app that runs local and remote models with an extension system.

Open sourceExtensions
Details on runoffline →

llama.cpp

The engine under the hood

The legendary C/C++ inference engine that made local LLMs practical. Powers most tools on this page.

AdvancedMax performance
Details on runoffline →
Personal agent platforms

Your private workforce — meet the agents

A new generation of desktop-native agents that can see your screen, click your apps, read your files and get real work done — all without sending a single byte to the cloud.

OpenClaw

Open desktop agent · MIT licensed

The community answer to Claude Computer Use. Gives any local model hands — mouse, keyboard, browser and shell — with a safe action layer.

Details on runoffline →
H

Hermes Agent

Messenger for your workflows

A multi-agent orchestrator built around the Hermes instruction-tuned models, designed for chained reasoning, tool use and long-horizon tasks.

Details on runoffline →
Z

ZeroClaw

Zero-config autonomous agent

Spin up an autonomous agent with a single binary. ZeroClaw plans, executes and self-reviews goals using any Ollama or llama.cpp model.

Details on runoffline →

OpenCode

Open coding agent · Apache 2.0 · 140K+ stars

The most adopted open-source terminal coding agent. LSP-powered edits, multi-session, and works with any local model via 75+ LLM providers.

Details on runoffline →
Free open-weight models

What to run, for whatever you're building

A complete catalog of free models — chat, coding, reasoning, vision, image generation, speech, music, video and embeddings — each mapped to the loaders that can actually run it.

30-second quickstart

Your first local model in under a minute

Terminal
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a strong open model
ollama pull llama3.1:8b

# 3. Chat with it — fully offline
ollama run llama3.1:8b "Draft a launch email for runoffline.ai"

# 4. Or use the OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"hi"}]}'
A global movement · thousands of builders · one open ecosystem

The global community behind offline AI

runoffline.ai is more than a directory — it's a rallying point. See what builders are shipping on X right now, jump into the biggest local-AI subreddits, trending GitHub repos and Hacker News threads, or hop straight into a Discord full of people who run models on their own machines.

140+
countries with active local-AI builders
470K+
members in r/LocalLLaMA
3.2M+
combined GitHub stars across listed projects
24 / 7
live chat across Discord, Matrix, IRC
Live from the community
★ 98k
ollama / ollama
★ 72k
ggerganov / llama.cpp
★ 55k
open-webui / open-webui
★ 41k
comfyanonymous / ComfyUI
★ 28k
janhq / jan
▲ 612
Show HN: I run a 70B model on my laptop and it actually works
▲ 489
Why I stopped paying for OpenAI and moved my whole team to Ollama
▲ 404
Hermes Agent: self-hosted personal assistant that respects your data
▲ 287
The case for running AI on-device in 2026

Voices from the movement

A glimpse of what builders around the world are saying.

"Moved our whole R&D team to Ollama + Open WebUI last month. Zero API bill, zero data leaving our network, and latency is better than the cloud."
KM
Kenji M.
Staff engineer · Tokyo
𝕏
"Running Qwen2.5-Coder 32B locally with Aider changed how I write software. It's genuinely my pair programmer now — and it never phones home."
AS
Anna S.
Indie dev · Berlin
Reddit
"In places with spotty internet, offline models aren't a preference — they're the only option. runoffline.ai is the clearest map I've found."
OA
Oluwaseun A.
ML researcher · Lagos
HN
"I teach high-schoolers Python with a local Llama 3.1 running on a €300 mini-PC. No accounts, no billing, no surveillance. Just learning."
MR
Marta R.
Teacher · Barcelona
Mastodon
"OpenClaw + a small vision model on my MacBook replaces half my SaaS stack. This is the future I was promised."
DP
Devin P.
Solo founder · Austin
𝕏
"Our hospital's compliance team approved local inference in one week. Cloud would have taken 18 months. Open source won."
RV
Dr. Rohan V.
Clinical informatics · Bangalore
LinkedIn

Stop renting intelligence. Start running it.

Dive into the full catalog of runtimes and agents — carefully curated, neutrally compared, and updated as the local-AI world moves fast.

Compare everything → Read the guides