Apple M2 Ultra (192GB) vs Apple M4 Max (128GB)

How these GPUs compare for running local LLMs — VRAM, bandwidth, price, and per-model fit across popular open-weights models.

54 models compared192 GB vs 128 GB VRAM800 GB/s vs 546 GB/s$5,499 vs $3,999
GPU AFaster

Apple M2 Ultra (192GB)

VRAM
192 GB
Bandwidth
800 GB/s
Street price
$5,499
Vendor
apple
GPU B

Apple M4 Max (128GB)

VRAM
128 GB
Bandwidth
546 GB/s
Street price
$3,999
Vendor
apple

The short answer

Both GPUs handle the same models, but Apple M2 Ultra (192GB) is more than 20% faster on 37 of them. For the remaining 3 models, speeds are within 20% — you won't notice the gap. Apple M2 Ultra (192GB) gives consistently better throughput.

1 model A runs, B can't
37 models A is 20%+ faster
5 models B is 20%+ faster
3 equal (both run, <20% diff)
8 too large for either

Model-by-model fit

Click any row for the full breakdown. Tie% shown in the Winner column when both GPUs run the model within 20% of each other.

ModelApple M2 Ultra (192GB)Apple M4 Max (128GB)Winner
c4ai-command-r-v01 35B
35B · command
10 tok/s · FP167 tok/s · FP16A (faster)
Command-R+ 104B
104B · command
6 tok/s · Q8_04 tok/s · Q8_0A (faster)
DeepSeek R1 Distill Llama 8B
8B · deepseek
41 tok/s · FP1628 tok/s · FP16A (faster)
DeepSeek R1 Distill Qwen 14B
14.8B · deepseek
23 tok/s · FP1615 tok/s · FP16A (faster)
DeepSeek R1 Distill Llama 70B
70.6B · deepseek
5 tok/s · FP167 tok/s · Q8_0B (faster)
DeepSeek R1 671B
671B · deepseek
Too largeToo large
DeepSeek-V3 685B
685B · deepseek
Too largeToo large
DeepSeek-V3.2 685.4B
685.4B · deepseek
Too largeToo large
gemma-2-9b
9.2B · gemma
36 tok/s · FP1625 tok/s · FP16A (faster)
gemma-2-27b
27.2B · gemma
12 tok/s · FP168 tok/s · FP16A (faster)
Llama 3.1 8B Compact
8B · llama
37 tok/s · FP1625 tok/s · FP16A (faster)
CodeLlama 34B
34B · llama
10 tok/s · FP167 tok/s · FP16A (faster)
CodeLlama 34B
34B · llama
10 tok/s · FP167 tok/s · FP16A (faster)
Llama 3.3 70B
70.6B · llama
5 tok/s · FP167 tok/s · Q8_0B (faster)
Llama 3.1 70B
70.6B · llama
5 tok/s · FP167 tok/s · Q8_0B (faster)
Llama 4 Scout 17B
109B · llama
5 tok/s · Q8_05 tok/s · Q6_Kequal
Llama-4-Maverick-17B-128E
400B · llama
Too largeToo large
Llama 3.1 405B
405B · llama
Too largeToo large
Mistral 7B v0.1
7.25B · mistral
45 tok/s · FP1631 tok/s · FP16A (faster)
Codestral 22B
22.2B · mistral
15 tok/s · FP1610 tok/s · FP16A (faster)
Mixtral 8x7B Instruct v0.1
47B · mixtral
6 tok/s · FP164 tok/s · FP16A (faster)
Mistral Large 2 123B
123B · mistral
5 tok/s · Q8_05 tok/s · Q6_Kequal
Phi-4-mini 3.8B
3.8B · phi
84 tok/s · FP1657 tok/s · FP16A (faster)
Phi-4 14B
14B · phi
21 tok/s · FP1614 tok/s · FP16A (faster)
Qwen 2.5 1.5B
1.5B · qwen
195 tok/s · FP16133 tok/s · FP16A (faster)
Qwen 2.5 3B
3.1B · qwen
102 tok/s · FP1669 tok/s · FP16A (faster)
Qwen3.5-4B
4.7B · qwen
69 tok/s · FP1647 tok/s · FP16A (faster)
Qwen 2.5 7B
7.6B · qwen
43 tok/s · FP1630 tok/s · FP16A (faster)
Qwen 2.5 7B
7.6B · qwen
43 tok/s · FP1630 tok/s · FP16A (faster)
Qwen 3 8B
8B · qwen
37 tok/s · FP1625 tok/s · FP16A (faster)
Qwen3.5-9B
9.7B · qwen
34 tok/s · FP1623 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
9 tok/s · FP166 tok/s · FP16A (faster)
Qwen3.5-35B-A3B
36B · qwen
9 tok/s · FP166 tok/s · FP16A (faster)
Qwen 2.5 72B
72.7B · qwen
5 tok/s · FP166 tok/s · Q8_0B (faster)
Qwen 2.5 72B
72.7B · qwen
5 tok/s · FP166 tok/s · Q8_0B (faster)
Llama 3.2 1B
1.24B · llama
238 tok/s · FP16163 tok/s · FP16A (faster)
Llama 4 Scout 17B
109B · llama
5 tok/s · Q8_05 tok/s · Q6_Kequal
DeepSeek R1 671B
671B · deepseek
Too largeToo large
Gemma 3 27B
27B · gemma
11 tok/s · FP167 tok/s · FP16A (faster)
Qwen 3 8B
8B · qwen
37 tok/s · FP1625 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
9 tok/s · FP166 tok/s · FP16A (faster)
Llama 3.1 8B Compact
8B · llama
37 tok/s · FP1625 tok/s · FP16A (faster)
Mixtral 8x7B Instruct v0.1
47B · mixtral
6 tok/s · FP164 tok/s · FP16A (faster)
Mistral Small 3.2 24B
24B · mistral
15 tok/s · FP1610 tok/s · FP16A (faster)
Command A 111B
111B · command
6 tok/s · Q8_04 tok/s · Q8_0A (faster)
DeepSeek R1 0528
685B · deepseek
Too largeToo large
DeepSeek-V3-0324
684.5B · deepseek
Too largeToo large
DeepSeek-R1-0528-Qwen3-8B
8.2B · qwen
42 tok/s · FP1629 tok/s · FP16A (faster)
Qwen3-235B-A22B-Instruct-2507
235B · qwen
5 tok/s · Q4_K_MToo largeA (only)
Qwen3-30B-A3B-Instruct-2507
30B · qwen
23 tok/s · Q8_016 tok/s · Q8_0A (faster)
Qwen3-4B-Instruct-2507
4B · qwen
87 tok/s · FP1659 tok/s · FP16A (faster)
gemma-4-E4B-it
8B · gemma
44 tok/s · FP1630 tok/s · FP16A (faster)
gemma-4-26B-A4B-it
26.5B · gemma
26 tok/s · Q8_018 tok/s · Q8_0A (faster)
gemma-4-31B-it
32.7B · gemma
21 tok/s · Q8_015 tok/s · Q8_0A (faster)

Want a different pairing? Browse all comparisons →

Stay ahead of local AI