Dev.to5d ago1 min read

The fastest non-VLM parser that preserves...

🚀 The developers found room to improve on latency, so we profiled. We initially expected the sorting algorithm (XY-Cut++) to be the bottleneck, but it turned out to be less than **1% **of the total time. The real cost was hiding in content filtering (55%) and preprocessing (25%). 🖇️3 fixes applied 💥Page-level parallel processing 💥Hidden text detection → opt-in 💥Text-only fast path 💢Output is byte-for-byte identical before and after optimization. Only the speed changed results stay the same. 🖇️Op

Read original on dev.to