Background I did some research online and found a nice course that teach how to build LLM from scratch. The course is shared public online and all the assignment resources are here: https://cs336.stanford.edu/ . In the following series, I will put the summary and notes starting from lession 1. Tokenization Tokenization is at the very beginning of the LLM. There were many different tokenization algorithm, such as Character-based Tokenization, Byte-based Tokenization, Word-based Tokenization and B
Comment
Sign in to join the discussion.
Loading comments…