[Talk] Language Models

[In progress]

These slides are from a talk I gave on language models, prompted by OpenAI’s release of GPT-2. Notes on subsets of the talk below.

Why does GPT-2 matter?

Mastering human language has been a core task in artificial intelligence since Turing and the Dartmouth conference.

Machine learning has gotten to the point of being useful on narrow tasks like language translation in the last years but generating realistic text and transferring between different language tasks (translation, text generation, comprehension) has been a challenge.

GPT-2 shows that a relatively simple language modeling approach can lead to impressive results in both generation of realistic texts and transfer between tasks.

What are GPT-2’s main innovations?

Honestly, the impressive performance comes mostly from scaling existing approaches. The model has roughly ten times as many parameters as previous state-of-the-art models. Scaling of performance with compute was expected, but far from certain.

How big is GPT-2?

If you take my incredibly unprincipled approach of equating the number of synapses in a brain and parameters of a model, roughly honey-bee sized.

2019-07-23 19_08_50-Language Models - Google Slides.png

What are transformers and why do they appear in just about every breakthrough, from StarCraft to language models?

Transformers are an efficient neural network architecture for implementing attention mechanisms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s