Comparison Ollama Vulkan vs Ollama Rocm

Update : Added context length scaling (2k-32k) and CPU baseline (R9 7945hx 64GB)

num_ctx Vulkan ROCm Native (CPU)
2048 56.8 t/s 49.7 t/s 24.6 t/s
8192 52.6 t/s 49.1 t/s 24.8 t/s
16384 46.1 t/s 46.5 t/s 24.7 t/s
32768 40.3 t/s 43.4 t/s 24.5 t/s

Context scaling (2k → 32k performance loss):

  • Vulkan: -29%
  • ROCm: -13%
  • Native: -0.2%

Power consumption:

  • Vulkan: ~65 W (0.7-0.9 t/W)
  • ROCm: ~150 W (0.3 t/W)
  • Native: ~11 W (2.2 t/W)

Takeaways:

  • Small contexts (≤8k): Vulkan wins on speed + efficiency
  • Large contexts (≥16k): ROCm catches up in speed, but 2x power
  • CPU scales perfectly but is 2x slower than GPU
  • If power/heat matters more than speed: CPU is surprisingly viable at 25 t/s