Skip to content
Dev.to

Building a Tokenizer from Scratch [part 2]

From FSM to PDA: Q/A with Claude Opus In part 1, we built a working FSM that recognizes <div>text</div> using just 7 primitives mapped 1:1 to assembly opcodes. But FSMs have a hard limit: they can't handle nested structures like <div><div>hello</div></div>. In this post, we climb the Chomsky hierarchy from finite state machines to pushdown automata, build a PDA that recognizes nested <div> tags, and then turn it into a transducer that emits tokens. In other words we are building the core of a le
Read original on dev.to
0
0

Comment

Sign in to join the discussion.

Loading comments…

Related

Liked this? Start your own feed.

Your own feed is waiting.
0
0