100% offline · 100% open source · zero cloud dependencies

Run powerful AI entirely on your own machine.

runoffline.ai is the international hub for everyone exploring local LLMs — 100% open-source runtimes like Ollama, Open WebUI and Jan, and a new wave of personal agent platforms like OpenClaw, Hermes Agent and ZeroClaw. No clouds. No telemetry. No proprietary lock-in.

12+
Runtimes covered
9
Personal agent platforms
0
Data leaves your device
Models you can run
Just released

Fresh open-weight drops

New models ship faster than ever. Here's what just hit Hugging Face and can run on your own hardware — updated as releases happen.

Apr 24, 2026 NEW TODAY

DeepSeek V4

DeepSeek · MoE · open weights

The next generation of DeepSeek's flagship. Bigger context, stronger reasoning, and still fully open-weight — the release everyone's been waiting for since V3.

Ollama llama.cpp vLLM
Hugging Face →
Apr 2026

Qwen 3 Max

Alibaba · 235B MoE · open weights

Alibaba's top-tier Qwen 3 variant — trades blows with closed frontier models on code, math, and multilingual tasks, all still shippable locally.

Ollama vLLM SGLang
Hugging Face →
Mar 2026

Llama 4 Scout & Maverick

Meta · MoE · native multimodal

Meta's first natively multimodal MoE. Scout fits a single H100; Maverick is the enterprise tier. Both open-weight, both shipping on day one in llama.cpp.

Ollama llama.cpp MLX
Hugging Face →
Mar 2026

Gemma 3

Google · 1B – 27B · multimodal

Google's small-model series gets a major upgrade — longer context, vision, and 27B is the new sweet spot for a single consumer GPU.

Ollama llama.cpp MLX
Hugging Face →
Feb 2026

Mistral Large 3

Mistral AI · dense · Apache 2.0

Mistral's flagship returns to fully permissive Apache 2.0 weights. Strong function calling, excellent European-language coverage.

Ollama llama.cpp vLLM
Hugging Face →
Jan 2026

FLUX.2 [dev]

Black Forest Labs · diffusion · 12B

The successor to FLUX.1 — sharper prompt adherence and better text-in-image. Drops straight into ComfyUI, Forge, and SwarmUI.

ComfyUI Forge SwarmUI
Hugging Face →
Why run AI locally

Own the model. Own the data. Own the outcome.

Local AI flips the economics and ethics of intelligence. Your prompts, your files, your agents, and your reasoning — all stay on your machine, under your control.

🔒

Private by default

Every token is computed on your CPU/GPU. No prompts uploaded, no logs scraped, no vendor training on your workflow.

✈️

Works offline

A plane seat, a remote cabin, a classified network — your models keep working exactly the same, with zero latency to a data center.

💸

Zero per-token cost

Run millions of tokens a day without a bill. The only cost is the electricity you were already paying for.

🧠

Full customization

Swap models, quantizations, system prompts, tools, and memory. You aren't stuck in someone else's sandbox.

⚡️

Instant responses

Apple Silicon, NVIDIA, AMD and even modest CPUs can now stream tokens fast enough for real agentic work.

🛠️

Hackable ecosystem

From GGUF models to MCP tools, the local stack is open, inspectable, and composable — the way software should be.

Local LLM runtimes

The engines that bring models to your laptop

Whether you want a one-line install or a full developer-grade toolkit, there's a runtime made for your workflow.

Ollama

CLI + API · macOS · Linux · Windows

The easiest way to pull, run and serve open models. One command, hundreds of models, instant OpenAI-compatible API.

Beginner friendlyOpen source
Details on runoffline →

Open WebUI

Self-hosted web UI · BSD licensed

A ChatGPT-class web interface for your local models — multi-user, RAG, plugins, all open source.

Open sourceSelf-hosted
Details on runoffline →

Jan

Open-source ChatGPT alternative

A fully-open, privacy-first desktop chat app that runs local and remote models with an extension system.

Open sourceExtensions
Details on runoffline →

llama.cpp

The engine under the hood

The legendary C/C++ inference engine that made local LLMs practical. Powers most tools on this page.

AdvancedMax performance
Details on runoffline →
Personal agent platforms

Your private workforce — meet the agents

A new generation of desktop-native agents that can see your screen, click your apps, read your files and get real work done — all without sending a single byte to the cloud.

🐾

OpenClaw

Open desktop agent · MIT licensed

The community answer to Claude Computer Use. Gives any local model hands — mouse, keyboard, browser and shell — with a safe action layer.

Details on runoffline →
🪽

Hermes Agent

Messenger for your workflows

A multi-agent orchestrator built around the Hermes instruction-tuned models, designed for chained reasoning, tool use and long-horizon tasks.

Details on runoffline →
🦅

ZeroClaw

Zero-config autonomous agent

Spin up an autonomous agent with a single binary. ZeroClaw plans, executes and self-reviews goals using any Ollama or llama.cpp model.

Details on runoffline →
Free open-weight models

What to run, for whatever you're building

A complete catalog of free models — chat, coding, reasoning, vision, image generation, speech, music, video and embeddings — each mapped to the loaders that can actually run it.

30-second quickstart

Your first local model in under a minute

Terminal
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a strong open model
ollama pull llama3.1:8b

# 3. Chat with it — fully offline
ollama run llama3.1:8b "Draft a launch email for runoffline.ai"

# 4. Or use the OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"hi"}]}'
A global movement · thousands of builders · one open ecosystem

The global community behind offline AI

runoffline.ai is more than a directory — it's a rallying point. See what builders are shipping on X right now, jump into the biggest local-AI subreddits, trending GitHub repos and Hacker News threads, or hop straight into a Discord full of people who run models on their own machines.

140+
countries with active local-AI builders
470K+
members in r/LocalLLaMA
3.2M+
combined GitHub stars across listed projects
24 / 7
live chat across Discord, Matrix, IRC
Live from the community
★ 98k
ollama / ollama
★ 72k
ggerganov / llama.cpp
★ 55k
open-webui / open-webui
★ 41k
comfyanonymous / ComfyUI
★ 28k
janhq / jan
▲ 612
Show HN: I run a 70B model on my laptop and it actually works
▲ 489
Why I stopped paying for OpenAI and moved my whole team to Ollama
▲ 404
Hermes Agent: self-hosted personal assistant that respects your data
▲ 287
The case for running AI on-device in 2026

Voices from the movement

A glimpse of what builders around the world are saying.

"Moved our whole R&D team to Ollama + Open WebUI last month. Zero API bill, zero data leaving our network, and latency is better than the cloud."
KM
Kenji M.
Staff engineer · Tokyo
𝕏
"Running Qwen2.5-Coder 32B locally with Aider changed how I write software. It's genuinely my pair programmer now — and it never phones home."
AS
Anna S.
Indie dev · Berlin
Reddit
"In places with spotty internet, offline models aren't a preference — they're the only option. runoffline.ai is the clearest map I've found."
OA
Oluwaseun A.
ML researcher · Lagos
HN
"I teach high-schoolers Python with a local Llama 3.1 running on a €300 mini-PC. No accounts, no billing, no surveillance. Just learning."
MR
Marta R.
Teacher · Barcelona
Mastodon
"OpenClaw + a small vision model on my MacBook replaces half my SaaS stack. This is the future I was promised."
DP
Devin P.
Solo founder · Austin
𝕏
"Our hospital's compliance team approved local inference in one week. Cloud would have taken 18 months. Open source won."
RV
Dr. Rohan V.
Clinical informatics · Bangalore
LinkedIn

Stop renting intelligence. Start running it.

Dive into the full catalog of runtimes and agents — carefully curated, neutrally compared, and updated as the local-AI world moves fast.

Compare everything → Read the guides