Generative pre-trained transformer

Generative pre-trained transformers (GPT) are a family of large language models (LLMs) introduced in 2018 by the American artificial intelligence organization OpenAI. Like most LLMs, GPT models are artificial neural networks based on the transformer architecture, pre-trained in an unsupervised manner on large datasets of unlabelled text, and able to generate novel human-like text.

The original GPT model

Between 2018 and 2023, OpenAI released four major numbered GPT models, with each new release being significantly more capable than the previous, due to increased size (measured in number of trainable parameters) and training. The largest GPT-3 models, released in 2020, have 175 billion parameters and were trained on 400 billion tokens of text. OpenAI declined to publish the size or training details of its most recent model, GPT-4, citing "the competitive landscape and the safety implications of large-scale models".[1] OpenAI has been using these foundational GPT-n models as the basis for various other products and technologies, including models fine-tuned for instruction following, which in turn power the ChatGPT chatbot service.

The term "GPT" is also used in the names of some generative LLMs developed by others, such as a series of GPT-3 inspired models created by EleutherAI,[2] and most recently a series of seven models created by Cerebras.[3] Major companies in other industries (e.g. sales, finance) also use the term "GPT" in the names of their services involving or utilizing a GPT technology.[4][5]

History

On June 11, 2018, OpenAI published a paper entitled "Improving Language Understanding by Generative Pre-Training," in which it introduced the first GPT system.[6] Up to that point, the best-performing neural NLP (natural language processing) models mostly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.[6]

The particular "semi-supervised" approach OpenAI employed—and was first to use with a transformer model—involved two stages: an unsupervised generative "pre-training" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.[6]

Foundational GPT models

OpenAI GPT versions
Model	Architecture	Parameter count	Training data	Release date
Original GPT (GPT-1)	12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax.	117 million	BookCorpus:[7] 4.5 GB of text, from 7000 unpublished books of various genres.	June 11, 2018[8]
GPT-2	GPT-1, but with modified normalization	1.5 billion	WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.	February 14, 2019
GPT-3	GPT-2, but with modification to allow larger scaling	175 billion	570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).	June 11, 2020[9] (then March 15, 2022, for a revision ultimately termed GPT-3.5)
GPT-4	Also trained with both text prediction and RLHF; accepts both text and images as input. Further details are not public.[1]	Undisclosed	Undisclosed	March 14, 2023

Related models and products

In January 2022, OpenAI introduced InstructGPT, a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models.[10][11]

In November 2022, OpenAI launched ChatGPT, an online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT.[12]

References

OpenAI (2023). "GPT-4 Technical Report" (PDF). Archived (PDF) from the original on 2023-03-14. Retrieved 2023-03-16.
"EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J".
"News" (Press release).
https://www.fastcompany.com/90862354/salesforces-einsteingpt-may-be-the-most-meaningful-application-of-ai-chatbots-yet
https://www.forbes.com/sites/jamielsheikh/2023/04/05/the-chatgpt-of-finance-is-here-bloomberg-is-combining-ai-and-fintech/?sh=43b4385e3081
Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. pp. 19–27. arXiv:1506.06724. Archived from the original on 2023-02-05. Retrieved 2023-02-07.
"Improving language understanding with unsupervised learning". openai.com. Archived from the original on 2023-03-18. Retrieved 2023-03-18.
"Language models are few-shot learners". openai.com. Archived from the original on 2023-03-21. Retrieved 2023-03-21.
"Aligning language models to follow instructions". openai.com. Archived from the original on 23 March 2023. Retrieved 23 March 2023.
Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (4 March 2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155. {{cite journal}}: Cite journal requires |journal= (help)
"Introducing ChatGPT". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-16.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[gpt4-report-1] OpenAI (2023). "GPT-4 Technical Report" (PDF). Archived (PDF) from the original on 2023-03-14. Retrieved 2023-03-16.

[2] "EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J".

[3] "News" (Press release).

[4] ttps://www.fastcompany.com/90862354/salesforces-einsteingpt-may-be-the-most-meaningful-application-of-ai-chatbots-yet

[5] ttps://www.forbes.com/sites/jamielsheikh/2023/04/05/the-chatgpt-of-finance-is-here-bloomberg-is-combining-ai-and-fintech/?sh=43b4385e3081

[gpt1paper-6] Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.

[7] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. pp. 19–27. arXiv:1506.06724. Archived from the original on 2023-02-05. Retrieved 2023-02-07.

[8] "Improving language understanding with unsupervised learning". openai.com. Archived from the original on 2023-03-18. Retrieved 2023-03-18.

[9] "Language models are few-shot learners". openai.com. Archived from the original on 2023-03-21. Retrieved 2023-03-21.

[instructgpt-blog-10] "Aligning language models to follow instructions". openai.com. Archived from the original on 23 March 2023. Retrieved 23 March 2023.

[instructgpt-paper-11] Ouyang, Long; Wu, Jeff; Jiang, Xu; et al. (4 March 2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155. {{cite journal}}: Cite journal requires |journal= (help)

[chatgpt-blog-12] "Introducing ChatGPT". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-16.

OpenAI
Products	ChatGPT DALL-E GitHub Copilot OpenAI Five Triton
Language models	OpenAI Codex GPT family GPT-2 GPT-3 GPT-4
Related	AI Dungeon "Deep Learning" Microsoft 365 Copilot Microsoft Bing
Category Commons