The story of AI is one of the most dramatic in human history — decades of wild ambition, crushing failure, quiet progress, and then a sudden explosion that changed everything. This is the complete story, told as it actually happened.
In 1950, a British mathematician named Alan Turing published a paper called “Computing Machinery and Intelligence.” It opened with a deceptively simple question: “Can machines think?”
Turing was 38 years old. He had already helped crack the Nazi Enigma code during World War II — work that is credited with shortening the war by two years. Now he was asking something that would obsess scientists, philosophers, writers, and eventually almost every human being on earth for the next 75 years.
He didn’t try to define “thinking.” Instead, he proposed a test: put a human and a machine in separate rooms. Have a judge ask both questions through a text terminal. If the judge cannot reliably tell which is human and which is machine — the machine passes the test. He called it the Imitation Game. We now call it the Turing Test.
Turing didn’t build the first AI. He asked the first serious question about whether machines could ever be intelligent — and gave us a way to measure the answer. Every AI researcher since has been, in some sense, working on his question.
Six years after Turing’s paper, a young mathematician named John McCarthy organised a summer workshop at Dartmouth College in New Hampshire. The proposal he wrote to get funding included a bold claim: that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
The summer workshop was modest — ten researchers, eight weeks, one farmhouse. But McCarthy gave their work a name that stuck: Artificial Intelligence.
For the next decade, optimism was extraordinary. Researchers genuinely believed that human-level AI was ten years away. Herbert Simon, one of the pioneers, predicted in 1965 that “machines will be capable, within twenty years, of doing any work a man can do.”
He was wrong. But not forever.
The promises outran the results. The computers of the 1960s were laughably underpowered by today’s standards. The problems AI researchers were trying to solve — natural language, vision, reasoning — turned out to be far harder than anyone imagined.
Funding dried up. The field entered what became known as the “AI Winter” — a cold period where the hype collapsed and serious money disappeared. Many AI researchers quietly shifted to other fields.
It would not be the last winter.
AI returned in the 1980s, but in a more modest form. Instead of general intelligence, researchers focused on narrow expertise. Expert systems encoded the knowledge of human specialists — doctors, engineers, lawyers — into rules that computers could follow.
MYCIN, developed at Stanford, could diagnose blood infections as accurately as specialist physicians. R1 at Digital Equipment Corporation configured computer systems, saving the company an estimated $40 million per year. Companies invested billions. Japan announced a national “Fifth Generation” computing programme to build AI machines.
Then the second winter arrived. The expert systems were brittle — they could only do the one thing they were built for, and they required armies of specialists to maintain. By the late 1980s, the market for AI hardware collapsed. More billions evaporated.
While the headlines wrote off AI again, something important was happening in the background. A technique called neural networks — inspired by the structure of the human brain — was being quietly developed and improved.
The key breakthrough was backpropagation: an algorithm that allowed neural networks to learn from their mistakes, adjusting their internal weights to improve accuracy. Yann LeCun demonstrated that this technique could teach a computer to recognise handwritten digits with remarkable accuracy — work that would eventually form the foundation of modern computer vision.
In 1997, IBM’s Deep Blue defeated world chess champion Garry Kasparov — not through human-style understanding of chess, but through brute-force calculation and clever search algorithms. It was a milestone, but it was specialised. Deep Blue could not do anything else.
The moment that changed everything arrived at a computer vision competition in 2012. The ImageNet Large Scale Visual Recognition Challenge asked competing teams to correctly identify objects in photographs. The best previous result had an error rate of around 26%. A team from the University of Toronto — Geoffrey Hinton and his students Alex Krizhevsky and Ilya Sutskever — entered a deep neural network called AlexNet. Their error rate: 15.3%. Nothing close had ever been achieved before.
The deep learning era had begun. Neural networks, given enough data and enough computing power, could learn to see, hear, translate, and eventually converse — not because humans programmed them with rules, but because they discovered patterns themselves.
The researchers who had been quietly working on neural networks during the second winter — Hinton, LeCun, Yoshua Bengio — would later share the Turing Award (the Nobel Prize of computing) for their foundational work.
The next revolution came from Google Brain. In June 2017, eight researchers published a paper with an understated title: “Attention Is All You Need.” It described a new neural network architecture called the transformer — one that handled sequences of text in a fundamentally different way from everything before.
Previous approaches processed text word by word, in order. The transformer could consider all words simultaneously, weighing how each word related to every other word in the sequence. This “self-attention” mechanism made transformers dramatically more efficient to train and dramatically better at understanding language.
Within five years, transformer-based models would underlie essentially all of the major AI products the world would come to know.
In May 2020, OpenAI released GPT-3 — a language model with 175 billion parameters, trained on a dataset of 570 gigabytes of text. Access was initially restricted to researchers and developers via an API. The responses it produced were startling. It could write coherent essays, answer questions, translate, summarise, and generate code — often indistinguishably from human writing.
But the general public barely noticed. GPT-3 was accessible only through a technical interface. It required knowing how to use an API. The extraordinary capability sat behind a wall that most people couldn’t see.
On 30 November 2022, OpenAI released ChatGPT. It was free. It was simple. It was a chat interface — you typed, it replied. No technical knowledge required.
One million users signed up in five days. One hundred million in two months — the fastest consumer application to reach that milestone in history, according to a UBS analysis. For comparison, Instagram took 2.5 years to reach 100 million users. TikTok took nine months.
The world noticed.
Within weeks, every major technology company was in emergency meetings. Google declared a “code red.” Microsoft invested $10 billion in OpenAI and integrated ChatGPT into Bing, Office, and Windows. Google rushed Bard (later Gemini) to market. Meta open-sourced its Llama models. A French startup called Mistral AI raised funding at extraordinary speed. Anthropic — founded by former OpenAI researchers including Dario Amodei and his sister Daniela — had been working on their AI assistant Claude since 2021 and accelerated its release.
The AI race was on in a way no previous moment had achieved.
Three and a half years after ChatGPT’s launch, the landscape looks like this:
Turing’s question — “can machines think?” — has not been definitively answered. But the machines have become genuinely useful in ways that would have seemed extraordinary even ten years ago.
The story is not finished. It has barely started.
Alan Turing publishes “Computing Machinery and Intelligence.” Proposes the Imitation Game as a test of machine intelligence. Source: Mind, Vol. LIX, No. 236
John McCarthy coins the term “Artificial Intelligence” at the Dartmouth Workshop. The field is formally founded.
Overpromising leads to underfunding. The Lighthill Report (1973) is critical of AI progress. UK and US cut AI research funding significantly.
Rule-based AI systems (MYCIN, R1, XCON) generate commercial value. AI market reaches $1 billion. Japan launches Fifth Generation Computing programme (1982).
LISP machine market collapses. Expert systems prove brittle. Strategic Computing Initiative cancelled. $500M AI industry nearly disappears.
IBM’s Deep Blue defeats world chess champion Garry Kasparov in a six-game match. First time a computer defeats a reigning world chess champion under standard tournament conditions.
AlexNet wins ImageNet with 15.3% error rate vs 26% for previous best. Deep learning becomes the dominant paradigm. Source: Krizhevsky et al., NIPS 2012.
Vaswani et al. at Google Brain publish the transformer paper. Foundational architecture for GPT, BERT, Claude, Gemini, and all modern LLMs. Source: arxiv.org/abs/1706.03762
OpenAI releases GPT-3. 175B parameters, 570GB training data. Demonstrates few-shot learning across a wide range of tasks. Source: Brown et al., arxiv.org/abs/2005.14165
OpenAI launches ChatGPT. 1 million users in 5 days. 100 million users in 2 months — fastest consumer application to that milestone in history (UBS analysis, Feb 2023).
GPT-4 (OpenAI, March), Claude (Anthropic, March), Bard/Gemini (Google, March), Llama (Meta, February). Microsoft invests $10B in OpenAI. Every major tech company announces AI products.
GPT-4o (multimodal), Gemini 1.5/2.0 (1M+ token context), Claude 3.5/4 series (extended reasoning), Llama 3 open weights, AI embedded in operating systems, browsers, and professional tools.
Turing, A.M. (1950). “Computing Machinery and Intelligence.” Mind, 59(236), 433–460. academic.oup.com/mind
Introduced the Imitation Game as an operational definition of machine intelligence. Anticipated objections including the mathematical, theological, consciousness, and novel-behaviour arguments — most of which remain active debates today.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). “Learning representations by back-propagating errors.” Nature, 323, 533–536. nature.com
Demonstrated that the backpropagation algorithm could train multi-layer neural networks effectively, overcoming the limitations of single-layer perceptrons identified by Minsky and Papert (1969). This paper enabled deep neural networks as a practical approach.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS 2012. NeurIPS proceedings
AlexNet achieved 15.3% top-5 error rate on ImageNet, vs 26.2% for the second-place entry. Demonstrated that GPUs could train deep CNNs at scale. The paper that triggered the current deep learning era.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). “Attention Is All You Need.” NIPS 2017. arxiv.org/abs/1706.03762
Introduced the transformer architecture, replacing recurrent and convolutional architectures for sequence modelling tasks. The self-attention mechanism computes relationships between all positions simultaneously, enabling parallelisation during training. All major LLMs (GPT, BERT, T5, Claude, Gemini) are based on this architecture.
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arxiv.org/abs/1810.04805
BERT demonstrated bidirectional pre-training for language understanding — considering context from both left and right simultaneously. Achieved state-of-the-art on 11 NLP benchmarks. Foundation for Google Search improvements and subsequent encoder-based models.
Brown, T., Mann, B., Ryder, N., et al. (2020). “Language Models are Few-Shot Learners.” OpenAI. arxiv.org/abs/2005.14165
GPT-3 demonstrated that large language models could perform new tasks from just a few examples in the prompt (few-shot learning) — without task-specific fine-tuning. This in-context learning capability was emergent at scale and not explicitly trained for.
Ouyang, L., Wu, J., Jiang, X., et al. (2022). “Training language models to follow instructions with human feedback.” OpenAI. arxiv.org/abs/2203.02155
Introduced RLHF as the alignment technique that turned GPT-3 into a useful assistant. Human labellers ranked model outputs; a reward model was trained on these rankings; the LLM was fine-tuned with PPO to maximise the reward. InstructGPT was the direct precursor to ChatGPT.