An Overview of Different Transformer-based Language Models
Language models are models that learn the distribution of words in a language and can be used in a variety of natural language processing tasks such as question answering, sentiment analysis, translation, etc.
Transformers are encoder-decoder structures that learn word distributions by selectively attending to certain parts of the text, a mechanism known as attention.
I have written a blog post as part of the Ezra Tech Blog on different language models that use the Transformer as their main component such as USE, GPT and BERT and how they can be used for embedding input text. Please visit for details on each method, how it compares to other models and how to use them in Python.