The Full Story

The History of Artificial Intelligence — From 1950 to Today

The story of AI is one of the most dramatic in human history — decades of wild ambition, crushing failure, quiet progress, and then a sudden explosion that changed everything. This is the complete story, told as it actually happened.

History Updated April 2026 Primary sources only

It started with one question

In 1950, a British mathematician named Alan Turing published a paper called “Computing Machinery and Intelligence.” It opened with a deceptively simple question: “Can machines think?”

Turing was 38 years old. He had already helped crack the Nazi Enigma code during World War II — work that is credited with shortening the war by two years. Now he was asking something that would obsess scientists, philosophers, writers, and eventually almost every human being on earth for the next 75 years.

He didn’t try to define “thinking.” Instead, he proposed a test: put a human and a machine in separate rooms. Have a judge ask both questions through a text terminal. If the judge cannot reliably tell which is human and which is machine — the machine passes the test. He called it the Imitation Game. We now call it the Turing Test.

Why this moment matters

Turing didn’t build the first AI. He asked the first serious question about whether machines could ever be intelligent — and gave us a way to measure the answer. Every AI researcher since has been, in some sense, working on his question.

The birth of a field: 1956

Six years after Turing’s paper, a young mathematician named John McCarthy organised a summer workshop at Dartmouth College in New Hampshire. The proposal he wrote to get funding included a bold claim: that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

The summer workshop was modest — ten researchers, eight weeks, one farmhouse. But McCarthy gave their work a name that stuck: Artificial Intelligence.

For the next decade, optimism was extraordinary. Researchers genuinely believed that human-level AI was ten years away. Herbert Simon, one of the pioneers, predicted in 1965 that “machines will be capable, within twenty years, of doing any work a man can do.”

He was wrong. But not forever.

The first AI Winter: 1974–1980

The promises outran the results. The computers of the 1960s were laughably underpowered by today’s standards. The problems AI researchers were trying to solve — natural language, vision, reasoning — turned out to be far harder than anyone imagined.

Funding dried up. The field entered what became known as the “AI Winter” — a cold period where the hype collapsed and serious money disappeared. Many AI researchers quietly shifted to other fields.

It would not be the last winter.

Expert systems: the 1980s comeback

AI returned in the 1980s, but in a more modest form. Instead of general intelligence, researchers focused on narrow expertise. Expert systems encoded the knowledge of human specialists — doctors, engineers, lawyers — into rules that computers could follow.

MYCIN, developed at Stanford, could diagnose blood infections as accurately as specialist physicians. R1 at Digital Equipment Corporation configured computer systems, saving the company an estimated $40 million per year. Companies invested billions. Japan announced a national “Fifth Generation” computing programme to build AI machines.

Then the second winter arrived. The expert systems were brittle — they could only do the one thing they were built for, and they required armies of specialists to maintain. By the late 1980s, the market for AI hardware collapsed. More billions evaporated.

The quiet revolution: the 1990s and neural networks

While the headlines wrote off AI again, something important was happening in the background. A technique called neural networks — inspired by the structure of the human brain — was being quietly developed and improved.

The key breakthrough was backpropagation: an algorithm that allowed neural networks to learn from their mistakes, adjusting their internal weights to improve accuracy. Yann LeCun demonstrated that this technique could teach a computer to recognise handwritten digits with remarkable accuracy — work that would eventually form the foundation of modern computer vision.

In 1997, IBM’s Deep Blue defeated world chess champion Garry Kasparov — not through human-style understanding of chess, but through brute-force calculation and clever search algorithms. It was a milestone, but it was specialised. Deep Blue could not do anything else.

The deep learning revolution: 2012

The moment that changed everything arrived at a computer vision competition in 2012. The ImageNet Large Scale Visual Recognition Challenge asked competing teams to correctly identify objects in photographs. The best previous result had an error rate of around 26%. A team from the University of Toronto — Geoffrey Hinton and his students Alex Krizhevsky and Ilya Sutskever — entered a deep neural network called AlexNet. Their error rate: 15.3%. Nothing close had ever been achieved before.

The deep learning era had begun. Neural networks, given enough data and enough computing power, could learn to see, hear, translate, and eventually converse — not because humans programmed them with rules, but because they discovered patterns themselves.

The researchers who had been quietly working on neural networks during the second winter — Hinton, LeCun, Yoshua Bengio — would later share the Turing Award (the Nobel Prize of computing) for their foundational work.

The transformer changes everything: 2017

The next revolution came from Google Brain. In June 2017, eight researchers published a paper with an understated title: “Attention Is All You Need.” It described a new neural network architecture called the transformer — one that handled sequences of text in a fundamentally different way from everything before.

Previous approaches processed text word by word, in order. The transformer could consider all words simultaneously, weighing how each word related to every other word in the sequence. This “self-attention” mechanism made transformers dramatically more efficient to train and dramatically better at understanding language.

Within five years, transformer-based models would underlie essentially all of the major AI products the world would come to know.

GPT-3 and the pre-ChatGPT moment: 2020

In May 2020, OpenAI released GPT-3 — a language model with 175 billion parameters, trained on a dataset of 570 gigabytes of text. Access was initially restricted to researchers and developers via an API. The responses it produced were startling. It could write coherent essays, answer questions, translate, summarise, and generate code — often indistinguishably from human writing.

But the general public barely noticed. GPT-3 was accessible only through a technical interface. It required knowing how to use an API. The extraordinary capability sat behind a wall that most people couldn’t see.

The night that changed the world: 30 November 2022

On 30 November 2022, OpenAI released ChatGPT. It was free. It was simple. It was a chat interface — you typed, it replied. No technical knowledge required.

One million users signed up in five days. One hundred million in two months — the fastest consumer application to reach that milestone in history, according to a UBS analysis. For comparison, Instagram took 2.5 years to reach 100 million users. TikTok took nine months.

The world noticed.

Within weeks, every major technology company was in emergency meetings. Google declared a “code red.” Microsoft invested $10 billion in OpenAI and integrated ChatGPT into Bing, Office, and Windows. Google rushed Bard (later Gemini) to market. Meta open-sourced its Llama models. A French startup called Mistral AI raised funding at extraordinary speed. Anthropic — founded by former OpenAI researchers including Dario Amodei and his sister Daniela — had been working on their AI assistant Claude since 2021 and accelerated its release.

The AI race was on in a way no previous moment had achieved.

Where we are now: April 2026

Three and a half years after ChatGPT’s launch, the landscape looks like this:

  • AI assistants are used by hundreds of millions of people daily
  • Models can now process not just text but images, audio, video, and code simultaneously
  • Context windows — the amount of text a model can hold in “memory” at once — have expanded from 4,000 tokens to over one million
  • Open-source models from Meta and Mistral run on consumer hardware
  • AI is embedded in search engines, office suites, smartphones, cars, and medical diagnostic tools
  • Questions about AI safety, regulation, labour displacement, and the long-term future of the technology are being debated at every level of government and society

Turing’s question — “can machines think?” — has not been definitively answered. But the machines have become genuinely useful in ways that would have seemed extraordinary even ten years ago.

The story is not finished. It has barely started.

1950
Turing Test proposed

Alan Turing publishes “Computing Machinery and Intelligence.” Proposes the Imitation Game as a test of machine intelligence. Source: Mind, Vol. LIX, No. 236

1956
Dartmouth Conference — AI is named

John McCarthy coins the term “Artificial Intelligence” at the Dartmouth Workshop. The field is formally founded.

1966–1974
First AI Winter

Overpromising leads to underfunding. The Lighthill Report (1973) is critical of AI progress. UK and US cut AI research funding significantly.

1980s
Expert systems boom

Rule-based AI systems (MYCIN, R1, XCON) generate commercial value. AI market reaches $1 billion. Japan launches Fifth Generation Computing programme (1982).

1987–1993
Second AI Winter

LISP machine market collapses. Expert systems prove brittle. Strategic Computing Initiative cancelled. $500M AI industry nearly disappears.

1997
Deep Blue defeats Kasparov

IBM’s Deep Blue defeats world chess champion Garry Kasparov in a six-game match. First time a computer defeats a reigning world chess champion under standard tournament conditions.

2012
AlexNet — the deep learning moment

AlexNet wins ImageNet with 15.3% error rate vs 26% for previous best. Deep learning becomes the dominant paradigm. Source: Krizhevsky et al., NIPS 2012.

2017
“Attention Is All You Need” — transformer architecture

Vaswani et al. at Google Brain publish the transformer paper. Foundational architecture for GPT, BERT, Claude, Gemini, and all modern LLMs. Source: arxiv.org/abs/1706.03762

2020
GPT-3 — 175 billion parameters

OpenAI releases GPT-3. 175B parameters, 570GB training data. Demonstrates few-shot learning across a wide range of tasks. Source: Brown et al., arxiv.org/abs/2005.14165

30 Nov 2022
ChatGPT launches

OpenAI launches ChatGPT. 1 million users in 5 days. 100 million users in 2 months — fastest consumer application to that milestone in history (UBS analysis, Feb 2023).

2023
The AI race begins

GPT-4 (OpenAI, March), Claude (Anthropic, March), Bard/Gemini (Google, March), Llama (Meta, February). Microsoft invests $10B in OpenAI. Every major tech company announces AI products.

2024–2026
Multimodal, agentic, and integrated AI

GPT-4o (multimodal), Gemini 1.5/2.0 (1M+ token context), Claude 3.5/4 series (extended reasoning), Llama 3 open weights, AI embedded in operating systems, browsers, and professional tools.

Primary sources — the papers that built AI

1950 — The Turing Test

Turing, A.M. (1950). “Computing Machinery and Intelligence.” Mind, 59(236), 433–460. academic.oup.com/mind

Introduced the Imitation Game as an operational definition of machine intelligence. Anticipated objections including the mathematical, theological, consciousness, and novel-behaviour arguments — most of which remain active debates today.

1986 — Backpropagation

Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). “Learning representations by back-propagating errors.” Nature, 323, 533–536. nature.com

Demonstrated that the backpropagation algorithm could train multi-layer neural networks effectively, overcoming the limitations of single-layer perceptrons identified by Minsky and Papert (1969). This paper enabled deep neural networks as a practical approach.

2012 — AlexNet and deep learning

Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS 2012. NeurIPS proceedings

AlexNet achieved 15.3% top-5 error rate on ImageNet, vs 26.2% for the second-place entry. Demonstrated that GPUs could train deep CNNs at scale. The paper that triggered the current deep learning era.

2017 — Transformer architecture

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). “Attention Is All You Need.” NIPS 2017. arxiv.org/abs/1706.03762

Introduced the transformer architecture, replacing recurrent and convolutional architectures for sequence modelling tasks. The self-attention mechanism computes relationships between all positions simultaneously, enabling parallelisation during training. All major LLMs (GPT, BERT, T5, Claude, Gemini) are based on this architecture.

2018 — BERT (Google)

Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arxiv.org/abs/1810.04805

BERT demonstrated bidirectional pre-training for language understanding — considering context from both left and right simultaneously. Achieved state-of-the-art on 11 NLP benchmarks. Foundation for Google Search improvements and subsequent encoder-based models.

2020 — GPT-3 and few-shot learning

Brown, T., Mann, B., Ryder, N., et al. (2020). “Language Models are Few-Shot Learners.” OpenAI. arxiv.org/abs/2005.14165

GPT-3 demonstrated that large language models could perform new tasks from just a few examples in the prompt (few-shot learning) — without task-specific fine-tuning. This in-context learning capability was emergent at scale and not explicitly trained for.

2022 — InstructGPT and RLHF

Ouyang, L., Wu, J., Jiang, X., et al. (2022). “Training language models to follow instructions with human feedback.” OpenAI. arxiv.org/abs/2203.02155

Introduced RLHF as the alignment technique that turned GPT-3 into a useful assistant. Human labellers ranked model outputs; a reward model was trained on these rankings; the LLM was fine-tuned with PPO to maximise the reward. InstructGPT was the direct precursor to ChatGPT.