Dev.to6d ago1 min read

Claude Feels Slow. But Is Moving a Team to...

TL;DR Claude has a real speed problem for our team — but mostly in TTFT, not in raw decoding speed. I measured our actual usage and found this: TTFT p50: 4.2s–6.8s TTFT p90: 14.5s–28.1s Claude Sonnet decode p50: 176 tok/s That explains the feeling: Claude often isn’t that slow once it starts, but sometimes it takes so long to begin that the whole thing feels like it’s crawling. That naturally raises the next question: Should we move the team to self-hosted open-weight models? At first glance, th

Read original on dev.to