Skip to content
Dev.to1 min read

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The...

Start with the benchmarks In a previous article, I compared three Qwen3.5 models on the same hardware. Here are the MoE-relevant numbers. Test environment: RTX 4060 8GB / Ryzen 7 / 32GB DDR5 / llama.cpp / Q4_K_M Model Speed(t/s) VRAM GPU% CPU% RAM ngl Qwen3.5-9B 33.0 7.1GB 91% 32% 22.6GB 99 (all layers GPU) Qwen3.5-27B 3.57 7.7GB 60% 74% 28.3GB 24 (24/58 layers GPU) Qwen3.5-35B-A3B 8.61 7.6GB 95% 65% 30.8GB 99 (all layers GPU) All three models consume nearly the same VRAM (7.1-7.7GB). Yet speed
Read original on dev.to
0
0

Comment

Sign in to join the discussion.

Loading comments…

Related

Get the 10 best reads every Sunday

Curated by AI, voted by readers. Free forever.

Liked this? Start your own feed.

0
0