Transformer explanations: a collection
Transformer is a powerful architecture that can be difficult to understand. There are many great explanations on the web, each approaching the subject in a different way. Here I link the explanations I liked, and mention who I believe the target audience is for each one.
The goal is to provide a collection of links for you to choose from, but reading all of them is still helpful to engage with the concept from different perspectives & to cement your knowledge.
Transformers from scratch
Target audience: people with Machine Learning background
Comment/opinion: An all-around outstanding explanation that includes clear code & excellent illustrations. A personal favorite.
The transformer … “explained”?
Target audience: people with general Computer Science background
Comment/opinion: Excellent overview, motivation, and intuition. No pictures. No math. Short.
Formal Algorithms for Transformers
Target audience: mathematically-minded ML people
Comment/opinion: Very nice & clear formalism. No pictures.
The Illustrated Transformer
Target audience: ML people who know what an embedding is
Comment/opinion: Great illustrations. Explanation of self-attention was not intuitively clear to me.
Transformer - Illustration and code
Target audience: ML people who know what an embedding is & find reading code helpful
Description by the author: “This notebook combines the excellent illustration of the transfomer by Jay Alammar and the code annonation by harvardnlp
lab.”
Opinion: Reading the code was very helpful for me. Math not rendered nicely. Should be read after The Illustrated Transformer.
The Annotated Transformer
Target audience: ML people who know what an embedding is, find reading code helpful, and are interested in full details of the original Transformer paper
Comment/opinion: This is a rearranged version of the paper intermingled with the code. Extensive. Math rendered nicely. Illustrations are ok.
That’s it! If you have suggestions on what else to include, send me an email :)