Ollama

Most popular MIT
The Docker of local LLMs.

Ollama is the fastest path from "I want to try Llama" to actually having it running on your laptop. A single install gives you a background daemon, an OpenAI-compatible API on localhost:11434, and a command-line interface that treats models like container images — ollama pull, ollama run, ollama serve. It quietly became the universal backend that almost every other app in this ecosystem plugs into.

  • One-line install on macOS, Linux, Windows
  • Drop-in OpenAI-compatible REST API
  • Hundreds of ready-to-pull GGUF models
  • Modelfiles for custom system prompts & merges
  • GPU acceleration (Metal, CUDA, ROCm)
  • Powers most local agents and IDE plugins
Best forDevelopers, tinkerers, agent backends
InterfaceCLI + HTTP API
LicenseMIT, open source

Open WebUI

100% open source Self-hosted
A ChatGPT-class web UI that runs entirely on your own hardware.

Open WebUI (formerly Ollama WebUI) is the most popular self-hosted chat interface for local LLMs. It's a full-featured web app — multi-user workspaces, RAG over your documents, web-search plugins, voice chat, image input, pipelines, function calling — all layered on top of Ollama or any OpenAI-compatible endpoint. Install it with a single Docker command and you have a private ChatGPT replacement on your LAN.

  • Full web UI with auth, roles, and team workspaces
  • RAG over docs, websites & YouTube transcripts
  • Pipelines & Python functions for custom tools
  • Works with Ollama, llama.cpp, vLLM, LocalAI
  • Voice + image chat, prompt library, model playground
  • One-line Docker install
Best forSelf-hosting a private ChatGPT for yourself or a team
InterfaceWeb UI
LicenseBSD-3-Clause — fully open

Jan

100% open source
An open-source ChatGPT that runs 100% offline.

Jan treats "private AI" as a first-class product philosophy. It's a fully open-source desktop app (AGPL) that can run local models via llama.cpp or connect to remote APIs — all behind a unified chat interface. Its extension system lets you add tools, memory, assistants and even alternative model engines, making it a community-owned alternative to closed chat apps.

  • Cross-platform desktop app, AGPL-licensed
  • Unified UI for local + remote models
  • Extension system for tools and agents
  • Per-assistant system prompts and memory
  • Export / import every conversation
  • Active non-profit-style community
Best forPrivacy-maximalists who want ChatGPT without the cloud
InterfaceGUI
LicenseAGPLv3

GPT4All

Nomic AI
Local chat with your documents, built for everyone.

GPT4All pioneered the "local ChatGPT" category and still shines at bringing offline AI to non-technical users. The desktop app bundles quantized models with a friendly chat UI and a LocalDocs feature that turns any folder into a private RAG knowledge base. A Python SDK makes it easy to embed into your own apps.

  • Cross-platform installer — Windows, macOS, Linux
  • LocalDocs: chat with PDFs, notes, and folders
  • CPU-friendly, runs on modest hardware
  • Python SDK for app integration
  • No account, no cloud, no telemetry
  • Strong focus on education & accessibility
Best forNon-technical users & doc-grounded chat
InterfaceGUI + Python SDK
LicenseMIT

llama.cpp

Advanced MIT
The C/C++ engine that started the local-LLM revolution.

llama.cpp is the quiet foundation of the entire ecosystem. Georgi Gerganov's hand-tuned C++ engine made it possible to run multi-billion-parameter transformers on a laptop through aggressive quantization and custom kernels for Metal, CUDA, Vulkan, ROCm, SYCL and plain CPU. If you want raw speed, fine-grained control, or to build your own tool on top of inference, this is where you go.

  • Pure C/C++, zero Python required
  • GGUF model format — the de facto standard
  • Metal / CUDA / Vulkan / ROCm / CPU kernels
  • Built-in HTTP server (llama-server)
  • Grammar-constrained JSON / schema output
  • Powers Ollama, Jan, GPT4All, KoboldCpp & more
Best forDevelopers squeezing every token/sec
InterfaceCLI, library, HTTP server
LicenseMIT

vLLM

Server grade
The fastest way to serve a local model to many users.

Built at UC Berkeley, vLLM is the go-to inference server when you stop running a model for yourself and start running it for a team, an app, or a whole company. Its PagedAttention and continuous batching give throughput an order of magnitude higher than naive serving, and it speaks the OpenAI API wire-protocol out of the box.

  • PagedAttention — industry-leading throughput
  • Continuous batching & speculative decoding
  • OpenAI-compatible HTTP server
  • Tensor / pipeline parallelism across GPUs
  • Supports most Hugging Face architectures
  • Heavy-duty production use at scale
Best forSelf-hosting LLMs for teams & apps
InterfaceHTTP server + Python
LicenseApache 2.0

Apple MLX

Apple Silicon
Native AI acceleration for M-series Macs.

MLX is Apple's open-source array framework designed from the ground up for unified memory on Apple Silicon. For local LLMs it means dramatically faster inference and fine-tuning than generic CPU/GPU paths. Projects like mlx-lm and mlx-community now publish thousands of pre-quantized models optimized for M1/M2/M3/M4.

  • Unified-memory-aware kernels for Apple Silicon
  • Python + Swift APIs
  • LoRA & QLoRA fine-tuning on-device
  • MLX-compatible model zoo on Hugging Face
  • Integrates with Ollama & llama.cpp-based tools
Best forMac users who want peak speed
InterfacePython library
LicenseMIT

LocalAI

Self-hostable
A drop-in OpenAI replacement you can host yourself.

LocalAI is a single Go binary (and Docker image) that exposes the full OpenAI API surface — chat, embeddings, images, audio, rerankers — backed by whatever local engines you configure. Perfect for replacing OpenAI in existing apps without rewriting a line of code.

  • Full OpenAI API coverage, locally hosted
  • Chat, embeddings, images, TTS & STT in one service
  • Backend-agnostic (llama.cpp, diffusers, whisper.cpp…)
  • Docker / Kubernetes friendly
  • Gallery of pre-configured models
Best forTeams replacing OpenAI behind existing apps
InterfaceHTTP server
LicenseMIT
The Swiss-army knife for power users.

Affectionately known as "oobabooga", this web-based UI exposes just about every knob a modern inference engine has — backends, sampling parameters, LoRA stacking, training, extensions and API modes. It's the preferred environment for model enthusiasts, roleplayers, and researchers who want to push their hardware.

  • Supports llama.cpp, ExLlama, Transformers backends
  • Rich chat, notebook and instruction modes
  • Extension ecosystem (TTS, RAG, characters…)
  • Training & LoRA fine-tuning inside the UI
  • Local OpenAI-compatible API
Best forPower users, roleplayers & researchers
InterfaceLocal web UI
LicenseAGPLv3

KoboldCpp

Writer-focused
Local AI for storytelling and long-form writing.

KoboldCpp wraps llama.cpp in a single executable tuned for creative writing and interactive fiction. It offers world info, author notes, memory, lorebooks, and an image-gen module — a full offline writing studio for novelists, game designers and roleplayers.

  • Single executable, no install required
  • Memory, world info & lorebook support
  • Image generation via Stable Diffusion
  • OpenAI & KoboldAI compatible APIs
  • Runs beautifully on modest GPUs
Best forWriters, game masters, worldbuilders
InterfaceLocal web UI + API
LicenseAGPLv3

Cherry Studio

Open source
A cross-platform desktop client for local and open models.

Cherry Studio is an open-source desktop AI client that bundles a friendly chat UI, multi-assistant management, built-in RAG, and a plugin system — all able to run against a local Ollama or llama.cpp server without any account or cloud dependency. It's a polished, fully-open answer for users who want a "real app" feel without sacrificing openness.

  • Cross-platform desktop (macOS / Windows / Linux)
  • Multi-assistant profiles & prompt library
  • RAG with local knowledge bases
  • MCP & tool-calling support
  • Works offline with Ollama / llama.cpp
Best forUsers wanting an open-source polished desktop app
InterfaceDesktop GUI
LicenseApache 2.0

AnythingLLM

Team-ready
Turn any documents into a private AI workspace.

AnythingLLM is a full-stack, self-hostable app that layers workspaces, permissions, agents and retrieval-augmented chat on top of any local model. It's the closest thing to "ChatGPT Enterprise, but on your own hardware" — and it can be deployed on a laptop or a server with equal ease.

  • Multi-user workspaces with role-based access
  • RAG over PDFs, web, Confluence, GitHub, YouTube…
  • Agentic tools & function calling
  • Docker / desktop / self-hosted deployments
  • Embed as a chat widget in your own products
Best forTeams wanting private knowledge bases
InterfaceWeb + Desktop
LicenseMIT

Can't decide? Let the comparison speak.

Side-by-side the differences become obvious: who's for beginners, who's for hackers, who's for scale.

Open comparison table → See agent platforms

Join the global local-AI community

Live posts on X, 470K+ builders in r/LocalLLaMA, active Discord & Matrix rooms, and trending GitHub repos — all gathered in one hub.