Maybe these papers: "Attention Is All You Need" (Transformer paper, 2017) "Improving Language Understanding by Generative Pre-training" (GPT-1, 2018) "Language Models are Unsupervised Multitask Learners" (GPT-2, 2019)
Maybe these papers: "Attention Is All You Need" (Transformer paper, 2017) "Improving Language Understanding by Generative Pre-training" (GPT-1, 2018) "Language Models are Unsupervised Multitask Learners" (GPT-2, 2019)