100% offline · 100% open source · zero cloud dependencies

Run powerful AI entirely on your own machine.

runoffline.ai is the international hub for everyone exploring local LLMs — 100% open-source runtimes like Ollama, Open WebUI and Jan, and a new wave of personal agent platforms like OpenClaw, Hermes Agent and ZeroClaw. No clouds. No telemetry. No proprietary lock-in.

Explore Local Runners → Meet the Agents

12+

Runtimes covered

Personal agent platforms

Data leaves your device

∞

Models you can run

Just released

Fresh open-weight drops

New models ship faster than ever. Here's what just hit Hugging Face and can run on your own hardware — updated as releases happen.

Apr 24, 2026 NEW TODAY

DeepSeek V4

DeepSeek · MoE · open weights

The next generation of DeepSeek's flagship. Bigger context, stronger reasoning, and still fully open-weight — the release everyone's been waiting for since V3.

Ollama llama.cpp vLLM

Hugging Face →

Apr 2026

Qwen 3 Max

Alibaba · 235B MoE · open weights

Alibaba's top-tier Qwen 3 variant — trades blows with closed frontier models on code, math, and multilingual tasks, all still shippable locally.

Ollama vLLM SGLang

Hugging Face →

Mar 2026

Llama 4 Scout & Maverick

Meta · MoE · native multimodal

Meta's first natively multimodal MoE. Scout fits a single H100; Maverick is the enterprise tier. Both open-weight, both shipping on day one in llama.cpp.

Ollama llama.cpp MLX

Hugging Face →

Mar 2026

Gemma 3

Google · 1B – 27B · multimodal

Google's small-model series gets a major upgrade — longer context, vision, and 27B is the new sweet spot for a single consumer GPU.

Ollama llama.cpp MLX

Hugging Face →

Feb 2026

Mistral Large 3

Mistral AI · dense · Apache 2.0

Mistral's flagship returns to fully permissive Apache 2.0 weights. Strong function calling, excellent European-language coverage.

Ollama llama.cpp vLLM

Hugging Face →

Jan 2026

FLUX.2 [dev]

Black Forest Labs · diffusion · 12B

The successor to FLUX.1 — sharper prompt adherence and better text-in-image. Drops straight into ComfyUI, Forge, and SwarmUI.

ComfyUI Forge SwarmUI

Hugging Face →

Browse the full model catalog →

Why run AI locally

Own the model. Own the data. Own the outcome.

Local AI flips the economics and ethics of intelligence. Your prompts, your files, your agents, and your reasoning — all stay on your machine, under your control.

🔒

Private by default

Every token is computed on your CPU/GPU. No prompts uploaded, no logs scraped, no vendor training on your workflow.

✈️

Works offline

A plane seat, a remote cabin, a classified network — your models keep working exactly the same, with zero latency to a data center.

💸

Zero per-token cost

Run millions of tokens a day without a bill. The only cost is the electricity you were already paying for.

🧠

Full customization

Swap models, quantizations, system prompts, tools, and memory. You aren't stuck in someone else's sandbox.

⚡️

Instant responses

Apple Silicon, NVIDIA, AMD and even modest CPUs can now stream tokens fast enough for real agentic work.

🛠️

Hackable ecosystem

From GGUF models to MCP tools, the local stack is open, inspectable, and composable — the way software should be.

Local LLM runtimes

The engines that bring models to your laptop

Whether you want a one-line install or a full developer-grade toolkit, there's a runtime made for your workflow.

Ollama

CLI + API · macOS · Linux · Windows

The easiest way to pull, run and serve open models. One command, hundreds of models, instant OpenAI-compatible API.

Beginner friendlyOpen source

Details on runoffline →

Open WebUI

Self-hosted web UI · BSD licensed

A ChatGPT-class web interface for your local models — multi-user, RAG, plugins, all open source.

Open sourceSelf-hosted

Details on runoffline →

Jan

Open-source ChatGPT alternative

A fully-open, privacy-first desktop chat app that runs local and remote models with an extension system.

Open sourceExtensions

Details on runoffline →

llama.cpp

The engine under the hood

The legendary C/C++ inference engine that made local LLMs practical. Powers most tools on this page.

AdvancedMax performance

Details on runoffline →

See all 12 runtimes →

Personal agent platforms

Your private workforce — meet the agents

A new generation of desktop-native agents that can see your screen, click your apps, read your files and get real work done — all without sending a single byte to the cloud.

🐾

OpenClaw

Open desktop agent · MIT licensed

The community answer to Claude Computer Use. Gives any local model hands — mouse, keyboard, browser and shell — with a safe action layer.

Details on runoffline →

🪽

Hermes Agent

Messenger for your workflows

A multi-agent orchestrator built around the Hermes instruction-tuned models, designed for chained reasoning, tool use and long-horizon tasks.

Details on runoffline →

🦅

ZeroClaw

Zero-config autonomous agent

Spin up an autonomous agent with a single binary. ZeroClaw plans, executes and self-reviews goals using any Ollama or llama.cpp model.

Details on runoffline →

Explore all agents →

Free open-weight models

What to run, for whatever you're building

A complete catalog of free models — chat, coding, reasoning, vision, image generation, speech, music, video and embeddings — each mapped to the loaders that can actually run it.

💬

Chat & general

Llama 3 · Qwen 2.5 · Mistral · Gemma · Phi

Instruction-tuned LLMs for writing, Q&A and the reasoning core of most agents.

OllamaOpen WebUI

👨‍💻

Coding

Qwen Coder · DeepSeek Coder · Codestral · StarCoder 2

Drop-in backends for Continue, Aider and Cline — local Copilot quality, zero code leaving your box.

Ollamallama.cpp

🧠

Reasoning

DeepSeek-R1 · QwQ · Marco-o1

"Think-before-you-speak" models that outscore GPT-4-class systems on math and logic.

New wave

🎨

Image generation

FLUX.1 · SD 3.5 · SDXL · LoRAs & ControlNets

Different plumbing from LLMs — needs ComfyUI, Forge, InvokeAI or Draw Things. We map every model to its loader.

ComfyUIInvokeAI

👁️

Vision & multimodal

Llama 3.2 Vision · Qwen2-VL · LLaVA · Pixtral

Let agents see images, screens and documents — the brain behind OpenClaw-style computer use.

OllamavLLM

🎙️

Speech, music & video

Whisper · Piper · XTTS · MusicGen · HunyuanVideo

Local transcription, voice cloning, song generation and video synthesis — all open-weight, all offline.

whisper.cppComfyUI

Browse the full catalog →

30-second quickstart

Your first local model in under a minute

Terminal

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a strong open model
ollama pull llama3.1:8b

# 3. Chat with it — fully offline
ollama run llama3.1:8b "Draft a launch email for runoffline.ai"

# 4. Or use the OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"hi"}]}'

A global movement · thousands of builders · one open ecosystem

The global community behind offline AI

runoffline.ai is more than a directory — it's a rallying point. See what builders are shipping on X right now, jump into the biggest local-AI subreddits, trending GitHub repos and Hacker News threads, or hop straight into a Discord full of people who run models on their own machines.

140+

countries with active local-AI builders

470K+

members in r/LocalLLaMA

3.2M+

combined GitHub stars across listed projects

24 / 7

live chat across Discord, Matrix, IRC

Live from the community

Posts by @ollama and the local-AI community

Follow #LocalLLM, #Ollama, #OpenSourceAI for the worldwide conversation.

▲ 2.1k

DiscussionWhat's your daily-driver local model in 2026?

r/LocalLLaMA· 487 comments· 6h

▲ 1.7k

ResourcesOllama vs llama.cpp on Apple Silicon — the definitive benchmark

r/LocalLLaMA· 312 comments· 14h

▲ 1.3k

ShowcaseBuilt a fully offline coding agent with Aider + Qwen2.5-Coder 32B

r/LocalLLaMA· 221 comments· 1d

▲ 980

Help16GB RAM — what's the best model I can actually run well?

r/LocalLLM· 184 comments· 1d

▲ 742

WorkflowComfyUI + SDXL-Lightning: sub-second generations on an M3

r/StableDiffusion· 98 comments· 2d

★ 98k

ollama / ollama

Go· Get up and running with large language models locally

★ 72k

ggerganov / llama.cpp

C++· The engine that started it all

★ 55k

open-webui / open-webui

Svelte· Self-hosted ChatGPT-style UI for any local backend

★ 41k

comfyanonymous / ComfyUI

Python· Node-graph diffusion workflows, 100% local

★ 28k

janhq / jan

TypeScript· An open-source alternative to proprietary chat apps

▲ 612

Show HN: I run a 70B model on my laptop and it actually works

news.ycombinator.com· 241 comments

▲ 489

Why I stopped paying for OpenAI and moved my whole team to Ollama

news.ycombinator.com· 318 comments

▲ 404

Hermes Agent: self-hosted personal assistant that respects your data

news.ycombinator.com· 156 comments

▲ 287

The case for running AI on-device in 2026

news.ycombinator.com· 201 comments

🧠r/LocalLLaMA

★ Biggest

The central nervous system of the open-source LLM world.

470K+ members · active 24/7

💬Ollama Discord

Live chat

Real-time help from Ollama maintainers and power users.

40K+ members · 12 languages

🤗Hugging Face

Model hub

Where nearly every open-weight model ships first.

500K+ discussions · 1M+ models

🪐Jan Discord

Builders

Open-source ChatGPT alternative, fully offline-capable.

15K+ members · extensions & tips

🟪Matrix: #localai

Federated

Decentralized, privacy-first chat for local-AI hackers.

Self-hostable · encrypted

⚙️GitHub Discussions

Technical

Deep-dive issues, RFCs and build tips across every project.

Ollama · llama.cpp · Open WebUI · more

𝕏#LocalLLM on X

Realtime

Demos, benchmarks and hot takes from builders worldwide.

Follow the tag · post your builds

🟧Hacker News

Deep dives

Long-form threads dissecting every local-AI release.

Show HN · Ask HN · daily

Offline-AI digest, once a week.

New runtimes, hot models, and the best community threads — curated, zero spam.

Voices from the movement

A glimpse of what builders around the world are saying.

"Moved our whole R&D team to Ollama + Open WebUI last month. Zero API bill, zero data leaving our network, and latency is better than the cloud."

Kenji M.
Staff engineer · Tokyo

𝕏

"Running Qwen2.5-Coder 32B locally with Aider changed how I write software. It's genuinely my pair programmer now — and it never phones home."

Anna S.
Indie dev · Berlin

"In places with spotty internet, offline models aren't a preference — they're the only option. runoffline.ai is the clearest map I've found."

Oluwaseun A.
ML researcher · Lagos

"I teach high-schoolers Python with a local Llama 3.1 running on a €300 mini-PC. No accounts, no billing, no surveillance. Just learning."

Marta R.
Teacher · Barcelona

Mastodon

"OpenClaw + a small vision model on my MacBook replaces half my SaaS stack. This is the future I was promised."

Devin P.
Solo founder · Austin

𝕏

"Our hospital's compliance team approved local inference in one week. Cloud would have taken 18 months. Open source won."

Dr. Rohan V.
Clinical informatics · Bangalore

Stop renting intelligence. Start running it.

Dive into the full catalog of runtimes and agents — carefully curated, neutrally compared, and updated as the local-AI world moves fast.

Compare everything → Read the guides