Generative pre-trained transformer
Generative pre-trained transformers (GPT) are a family of large language models (LLMs) introduced in 2018 by the American artificial intelligence organization OpenAI. Like most LLMs, GPT models are artificial neural networks based on the transformer architecture, pre-trained in an unsupervised manner on large datasets of unlabelled text, and able to generate novel human-like text.

Between 2018 and 2023, OpenAI released four major numbered GPT models, with each new release being significantly more capable than the previous, due to increased size (measured in number of trainable parameters) and training. The largest GPT-3 models, released in 2020, have 175 billion parameters and were trained on 400 billion tokens of text. OpenAI declined to publish the size or training details of its most recent model, GPT-4, citing "the competitive landscape and the safety implications of large-scale models".[1] OpenAI has been using these foundational GPT-n models as the basis for various other products and technologies, including models fine-tuned for instruction following, which in turn power the ChatGPT chatbot service.
The term "GPT" is also used in the names of some generative LLMs developed by others, such as a series of GPT-3 inspired models created by EleutherAI,[2] and most recently a series of seven models created by Cerebras.[3] Major companies in other industries (e.g. sales, finance) also use the term "GPT" in the names of their services involving or utilizing a GPT technology.[4][5]
History
On June 11, 2018, OpenAI published a paper entitled "Improving Language Understanding by Generative Pre-Training," in which it introduced the first GPT system.[6] Up to that point, the best-performing neural NLP (natural language processing) models mostly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.[6]
The particular "semi-supervised" approach OpenAI employedโand was first to use with a transformer modelโinvolved two stages: an unsupervised generative "pre-training" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.[6]
Foundational GPT models
Model | Architecture | Parameter count | Training data | Release date |
---|---|---|---|---|
Original GPT (GPT-1) | 12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax. | 117 million | BookCorpus:[7] 4.5 GB of text, from 7000 unpublished books of various genres. | June 11, 2018[8] |
GPT-2 | GPT-1, but with modified normalization | 1.5 billion | WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit. | February 14, 2019 |
GPT-3 | GPT-2, but with modification to allow larger scaling | 175 billion | 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2). | June 11, 2020[9] (then March 15, 2022, for a revision ultimately termed GPT-3.5) |
GPT-4 | Also trained with both text prediction and RLHF; accepts both text and images as input. Further details are not public.[1] | Undisclosed | Undisclosed | March 14, 2023 |
Related models and products
In January 2022, OpenAI introduced InstructGPT, a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models.[10][11]
In November 2022, OpenAI launched ChatGPT, an online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT.[12]
References
- OpenAI (2023). "GPT-4 Technical Report" (PDF). Archived (PDF) from the original on 2023-03-14. Retrieved 2023-03-16.
- "EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J".
- "News" (Press release).
- https://www.fastcompany.com/90862354/salesforces-einsteingpt-may-be-the-most-meaningful-application-of-ai-chatbots-yet
- https://www.forbes.com/sites/jamielsheikh/2023/04/05/the-chatgpt-of-finance-is-here-bloomberg-is-combining-ai-and-fintech/?sh=43b4385e3081
- Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
- Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. pp. 19โ27. arXiv:1506.06724. Archived from the original on 2023-02-05. Retrieved 2023-02-07.
- "Improving language understanding with unsupervised learning". openai.com. Archived from the original on 2023-03-18. Retrieved 2023-03-18.
- "Language models are few-shot learners". openai.com. Archived from the original on 2023-03-21. Retrieved 2023-03-21.
- "Aligning language models to follow instructions". openai.com. Archived from the original on 23 March 2023. Retrieved 23 March 2023.
- Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (4 March 2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155.
{{cite journal}}
: Cite journal requires|journal=
(help) - "Introducing ChatGPT". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-16.