These slides are from a talk I gave on language models, prompted by OpenAI’s release of GPT-2. Notes on subsets of the talk below.
Why does GPT-2 matter?
Mastering human language has been a core task in artificial intelligence since Turing and the Dartmouth conference.
Machine learning has gotten to the point of being useful on narrow tasks like language translation in the last years but generating realistic text and transferring between different language tasks (translation, text generation, comprehension) has been a challenge.
GPT-2 shows that a relatively simple language modeling approach can lead to impressive results in both generation of realistic texts and transfer between tasks.
What are GPT-2’s main innovations?
Honestly, the impressive performance comes mostly from scaling existing approaches. The model has roughly ten times as many parameters as previous state-of-the-art models. Scaling of performance with compute was expected, but far from certain.
How big is GPT-2?
If you take my incredibly unprincipled approach of equating the number of synapses in a brain and parameters of a model, roughly honey-bee sized.
What are transformers and why do they appear in just about every breakthrough, from StarCraft to language models?
Transformers are an efficient neural network architecture for implementing attention mechanisms.