Monolithic 637K-parameter GRU out. Five tiny specialist heads in. Counting tripled. Physics doubled. No more cliffs. If you've read Parts 1 through 4, you already know the pattern: when a piece of OLT-1 isn't working, we don't make it bigger. We sandbox-test the alternatives, pick the one that actually wins, and keep what works. This is the post where that pattern hit the decoder. The decoder was the loudest part of OLT-1 - literally. It's the component that turns concept activations into langua
Comment
Sign in to join the discussion.
Loading comments…