Run Your First LLM Locally

A no-jargon, step-by-step guide for total beginners — from zero to a chatbot running on your own PC.

1

Why Run AI Locally?

Cloud AI services like ChatGPT and Claude are powerful, but they come with trade-offs. Running a Large Language Model (LLM) on your own machine gives you advantages that no cloud service can match:

  • Total privacy — your prompts and data never leave your device. Perfect for medical notes, legal docs, company secrets, or anything you don't want on someone else's server.
  • Zero recurring cost — once you download a model, you can use it forever with no API fees and no subscriptions.
  • Works offline — no Wi-Fi, no problem. Great for travel, air-gapped environments, or spotty connections.
  • No rate limits — send as many messages as you like, as fast as your hardware allows.
  • Full customizability — fine-tune models, swap system prompts, chain tools together — you're in control.
  • Ethical by design — your conversations never train corporate models, build behavioral profiles, or feed advertising systems. You get capable AI without trading your data for it.

The catch? You need decent hardware (especially a GPU with enough VRAM) and the models are not quite as powerful as the very best cloud offerings — yet. For most everyday tasks, though, today's open-source models are excellent.

2

What You Need

Here's a quick checklist before you start:

Hardware

  • GPU with 8 GB+ VRAM — NVIDIA (RTX 3060 12 GB or above), AMD (RX 7600 XT+), or Apple Silicon Mac (M1/M2/M3/M4 — unified memory counts as VRAM).
  • 16 GB RAM minimum (32 GB recommended for larger models).
  • ~20 GB free disk space for the runtime + one or two models.

No GPU? You can still run small models on CPU — it'll be slower (1–10 tok/s instead of 30–100+), but it works.

Software

  • Ollama — the easiest way to download and run LLMs. One install, one command. Free and open-source. Works on Windows, Mac, and Linux.
  • A chat UI (optional) — Ollama runs in a terminal, but you can connect a friendly UI like Open WebUI or Jan for a ChatGPT-like experience.
3

Find the Best Model for Your Hardware

Not every model runs well on every GPU. A 70-billion-parameter model will crawl on an 8 GB card, while a tiny model on a beefy GPU wastes potential. That's where LocalLLM Advisor comes in.

How to use our “Find a Model” tool

  1. 1.Go to Find a Model.
  2. 2.Select your GPU (or enter specs manually).
  3. 3.Pick a use-case — chat, coding, creative writing, etc.
  4. 4.Click “Find Models”. You'll instantly see a ranked list of models with estimated speed, quality scores, and VRAM usage.

Write down the model name (e.g. llama3.1:8b-q4_K_M) — you'll need it in the next step.

Tip Don't know your GPU? On Windows, open Task Manager → Performance → GPU. On Mac, click Apple menu → About This Mac — look for the chip name (e.g. “Apple M2 Pro 16 GB”).

4

Download & Install Ollama

Windows

  1. 1. Go to ollama.com/download
  2. 2. Download the Windows installer
  3. 3. Run the .exe — follow the wizard
  4. 4. Open Command Prompt or PowerShell

macOS

  1. 1. Go to ollama.com/download
  2. 2. Download the macOS app
  3. 3. Drag to Applications, open it
  4. 4. Open Terminal (Cmd + Space → “Terminal”)

Verify the install by typing:

ollama --version

If you see a version number, you're good to go.

5

Run Your First Model

With Ollama installed, running a model is a single command. Using the model name you found in Step 3:

ollama run llama3.1:8b

The first time you run this, Ollama will download the model (this may take a few minutes depending on your internet speed — models range from 4 GB to 40+ GB). After that, you'll see a prompt where you can start chatting directly in your terminal.

Try it Type something like “Explain quantum computing in simple terms” and watch the response stream in.

Useful commands

  • ollama list — see all downloaded models
  • ollama pull mistral — download a model without running it
  • ollama rm llama3.1:8b — delete a model to free disk space
  • ollama serve — start the API server (port 11434) for external UIs

Want a nicer chat interface?

Ollama exposes an API on localhost:11434. Connect a web UI for a ChatGPT-like experience:

  • Open WebUI — full-featured, runs via Docker
  • Jan — desktop app, great for beginners, no Docker needed
  • LM Studio — GUI with built-in model browser
6

Cloud Alternatives (When Local Isn't Enough)

Sometimes you need a bigger model than your hardware can handle, or you need massive throughput for a production workload. In that case, cloud GPU providers let you rent powerful machines by the hour:

RunPod

On-demand GPU pods, starting ~$0.20/hr

Visit site →

Vast.ai

GPU marketplace, cheap spot instances

Visit site →

Lambda

A100/H100 instances for serious workloads

Visit site →

AWS / GCP / Azure

Enterprise-grade, pay-per-second GPU VMs

These are great for experimenting with 70B+ models or running inference at scale. But for everyday personal use, a local setup beats them on privacy, cost, and convenience.

Ready to find the perfect model for your hardware?

Find Your Model

Stay ahead of local AI