Midjourney — The Complete Guide

What is Midjourney?

Midjourney is an AI that creates images from text descriptions. You type what you want to see — “a woman reading a book in a sunlit café in Paris, watercolour style” — and Midjourney generates a beautiful, original image in seconds.

It is not a photo editor. It is not a filter. It creates entirely new images that have never existed before, based entirely on your words.

A mum using Midjourney for the first time

Her daughter’s birthday party has a unicorn theme. She wants a personalised banner but cannot afford a designer. She types: “a magical unicorn with rainbow mane in a garden full of flowers, soft pastel colours, birthday celebration, children’s illustration style.” Midjourney generates four beautiful options. She picks one, downloads it, and takes it to a print shop. Total cost: the price of a print. Time: ten minutes.

Who made Midjourney?

Midjourney was founded by David Holz in San Francisco in 2021. Before Midjourney, Holz co-founded Leap Motion — a company that made hand-tracking hardware. Midjourney is unusual among major AI companies: it is independently funded, has never taken venture capital, and has been profitable since its early days — a remarkable achievement in a sector known for enormous losses.

The team is small by the standards of the AI industry — around 40 people as of 2024. This leanness has contributed to the company’s profitability and its focus on the quality of the product itself rather than rapid expansion.

The history of Midjourney

The founding idea (2021)

David Holz began working on Midjourney in 2021 with a specific philosophy: AI image generation should be an artistic tool, not just a technical demonstration. Where other image AI researchers focused on photorealism and benchmark metrics, Holz was interested in aesthetic quality, creative exploration, and what he called “expanding the imaginative powers of the human species.”

Open beta — July 2022

Midjourney launched its open beta in July 2022 via Discord — an unusual choice. Rather than building a standalone website, Midjourney set up a Discord server where users typed commands to generate images. Discord became the interface. The community became part of the product.

The results were immediately distinctive. Where other image AI tools of the time (DALL-E 2, Stable Diffusion) produced images that looked impressive but often artificial, Midjourney’s images had a different quality — more painterly, more evocative, more intentionally artistic. They looked like something a talented human artist might create, not like a photograph with glitches.

Version 4 — November 2022

Midjourney v4 was a leap. Better coherence, better anatomy, better understanding of complex prompts. It arrived at the same moment as ChatGPT — November 2022 — and both changed cultural conversations about AI simultaneously. While ChatGPT showed AI could write, Midjourney was showing AI could create art that people genuinely wanted to look at and own.

Version 5 — March 2023

Midjourney v5 set a new benchmark for photorealism and prompt adherence. For the first time, AI-generated images became genuinely difficult to distinguish from professional photography in many contexts. The version also came with controversy: users discovered it could generate hyper-realistic faces, leading to concerns about fake images of real people.

Version 6 — December 2023

Midjourney v6 added the ability to include accurate text within images — a notoriously difficult problem for image AI (earlier models produced garbled text). It also significantly improved prompt understanding, allowing longer and more nuanced descriptions to be followed precisely.

The web interface (2024)

After two years of operating solely through Discord, Midjourney launched a web interface at midjourney.com in 2024. Users could now generate, organise, and edit images through a browser rather than typing commands in a chat server. This dramatically lowered the barrier to entry.

Version 7 and beyond (2025–2026)

Midjourney v7 and subsequent updates continued improving coherence, speed, and control. Editor tools allowed precise in-painting (changing specific parts of an image), outpainting (extending an image beyond its borders), and image-to-image generation (starting from a reference image). The platform expanded to include video generation in beta.

What can you make with Midjourney?

Illustrations — Children’s book art, editorial illustrations, concept art
Marketing materials — Social media images, banner art, product visualisations
Interior design — See what a room might look like with different furniture or decoration
Fashion design — Visualise clothing concepts before manufacturing
Book covers and album art — Original artwork at a fraction of traditional commissioning cost
Personalised gifts — Custom portraits, birthday images, family illustrations
Architecture visualisation — Show clients what a building might look like
Game and film concept art — Fast visualisation of characters, environments, scenes

Getting started

Go to midjourney.com and sign up for an account
Choose a plan — Basic ($10/month), Standard ($30/month), or Pro ($60/month)
Click “Create” and type a description of the image you want
Midjourney generates four options — click to upscale the one you prefer
Download your image and use it for any purpose (check licence terms for commercial use)

Your very first Midjourney prompt

a cosy reading nook with warm lamp light, stacked books, a cup of tea, autumn leaves visible through the window, soft watercolour illustration style, warm amber and brown tones

Pricing

Basic — $10/mo

200 image generations per month. General commercial use.

Standard — $30/mo

Unlimited relaxed generations. 15h fast GPU time. Best for regular use.

Pro — $60/mo

30h fast time, stealth mode (private generations), max concurrency.

Source: midjourney.com/account — April 2026

Writing prompts that actually work

Midjourney responds to description, not commands. The better you describe what you want to see — the subject, the setting, the mood, the style, the lighting, the composition — the better the result. Think like a director briefing a cinematographer, not like someone typing a search query.

The anatomy of a great Midjourney prompt

Subject — What is the main thing in the image?
Setting / environment — Where is it? What surrounds it?
Mood / atmosphere — What feeling should it evoke?
Style — Photorealistic? Oil painting? Watercolour? Anime? Cinematic?
Lighting — Golden hour? Studio lighting? Candlelight? Dramatic shadows?
Technical parameters — Aspect ratio, version, quality

Essential parameters

--ar 16:9 — Aspect ratio (16:9 for widescreen, 1:1 for square, 9:16 for portrait/mobile)
--v 7 — Version (always use the latest for best results)
--style raw — Less artistic interpretation, closer to literal prompt
--no hands — Exclude specific elements (hands were historically problematic)
--stylize 250 — How much artistic style to apply (0–1000, default 100)
--chaos 20 — Variation between the four generations (0–100)
--seed 12345 — Fix the random seed to reproduce a result

20 prompts across every use case

1. Professional headshot / portrait

professional corporate headshot of a [age] [gender description] person, [hair colour and style], warm confident smile, soft studio lighting, blurred neutral grey background, shallow depth of field, Canon 85mm lens, photorealistic --ar 3:4 --style raw

2. Social media cover image

minimalist [brand colour — e.g. deep navy blue] background with subtle geometric patterns, elegant typography space in the centre, professional modern design, clean lines, suitable for a LinkedIn or website banner --ar 16:9

3. Children's book illustration

a [animal] wearing [clothing] [doing action], whimsical children's book illustration style, soft pastel colours, round friendly shapes, warm gentle lighting, charming and playful, in the style of classic picture book art --ar 4:3

4. Product photography

[product description] on a clean white surface, professional product photography, soft diffused studio lighting, slight shadow for depth, photorealistic, high detail, advertising quality, white background --ar 1:1 --style raw

5. Interior design visualisation

[room type] with [describe style — e.g. Scandinavian minimalist / warm maximalist / industrial modern] design, [describe key furniture and colours], natural light from large windows, architectural photography, wide angle, photorealistic --ar 16:9 --style raw

6. Fantasy landscape

epic fantasy landscape, [describe key elements — floating islands / ancient ruins / enchanted forest], dramatic lighting with rays of light breaking through clouds, rich saturated colours, cinematic widescreen composition, concept art style, highly detailed --ar 21:9

7. Food photography

[dish name], professional food photography, overhead flat lay or 45-degree angle, natural window light, [describe garnishes and plating style], [describe backdrop — rustic wooden table / white marble / dark slate], restaurant quality, highly appetising --ar 4:3 --style raw

8. Book or album cover

[describe concept or mood of the book/album], cover art, [style — oil painting / photorealistic / illustrated / abstract], dramatic composition, striking visual impact, space for title text at [top/bottom], [describe colour palette], professional publishing quality --ar 2:3

9. Logo concept exploration

minimalist logo design concept for a [type of business], simple geometric or iconic symbol, [describe brand values — modern/traditional/playful/trustworthy], clean vector-style illustration, white background, black and [accent colour], no text --ar 1:1

10. Architectural concept

[building type] with [describe architectural style], exterior view, [time of day and weather — golden hour, overcast, night], photorealistic architectural rendering, [describe key design features], professional architectural photography, wide angle --ar 16:9 --style raw

11. Character design / concept art

full body character design of [describe character — role, personality, appearance], [art style — concept art / anime / realistic / illustrated], [lighting style], [colour palette], expression: [describe], pose: [describe], detailed clothing and accessories, character sheet style --ar 2:3

12. Birthday / celebration image

beautiful [theme] birthday celebration illustration, [describe colour scheme], joyful and festive atmosphere, [describe key elements — balloons / flowers / cake / confetti], soft warm lighting, [style — watercolour / digital art / flat illustration], suitable for a greeting card --ar 4:3

13. Nature and landscape photography

[location or type of landscape] photography, [time of day — golden hour / blue hour / midday / storm approaching], [specific weather or atmospheric conditions], [describe key visual elements], photorealistic, National Geographic style, [camera details — 24mm wide angle / telephoto compression] --ar 16:9 --style raw

14. Pattern / texture design

seamless repeating pattern of [describe motifs — leaves / geometric shapes / floral elements], [style — watercolour / flat vector / linocut print], [colour palette], suitable for fabric or wallpaper printing, clean and balanced composition --ar 1:1

15. Infographic illustration

flat design illustration of [concept or process], [colour palette], clean modern style, [describe key elements to include], suitable for a business presentation or explainer, minimal detail, clear visual hierarchy, professional corporate design --ar 16:9

16. Moody cinematic scene

[describe scene], cinematic photography, anamorphic lens flare, [describe mood — melancholic / tense / hopeful / mysterious], dramatic colour grading, shallow depth of field, [colour palette — teal and orange / desaturated blues / warm amber], film still quality --ar 21:9

17. Abstract art

abstract [emotion or concept — chaos / serenity / transformation / energy], [art style — expressionist oil painting / digital abstract / geometric / fluid dynamics], [colour palette], [describe compositional feel — explosive / flowing / geometric / organic], gallery-quality fine art --ar 1:1

18. Fashion editorial

high fashion editorial photography, [describe model and clothing concept], [setting — urban / studio / natural], dramatic professional lighting, Vogue magazine quality, [describe colour story], [describe styling details], photorealistic --ar 2:3 --style raw

19. Personalised family illustration

warm family portrait illustration, [describe family composition], [art style — watercolour / charming cartoon / storybook illustration], joyful expressions, [describe setting — home / garden / park / favourite place], soft warm colours, suitable for printing as a gift --ar 4:3

20. Multi-panel storyboard

comic strip / storyboard panel, [describe scene], [art style — manga / American comic / children's illustration / graphic novel], [describe character], clear sequential visual storytelling, [describe action or moment], clean panel borders --ar 16:9

How Midjourney generates images

Midjourney uses a diffusion model architecture — specifically a latent diffusion model (LDM) in the same family as Stable Diffusion, though trained on Midjourney’s proprietary dataset and with proprietary model architecture choices that Midjourney has not fully disclosed.

Diffusion models: the core concept

A diffusion model learns to generate images by learning to reverse a noise process. During training: real images are progressively corrupted by adding Gaussian noise over many steps until the image is pure noise. The model learns to predict and remove that noise at each step — learning what a “real image” looks like at every level of noise.

During inference (generation): starting from pure random noise, the model iteratively denoises — guided by the text prompt — producing a coherent image after many denoising steps (typically 20–50).

Text conditioning: CLIP embeddings

The text prompt is converted to a numerical embedding using a CLIP (Contrastive Language-Image Pre-Training) text encoder or similar vision-language model. This embedding conditions the denoising process — the model denoises in the direction that produces an image consistent with the text embedding. The quality of the text encoder significantly influences how well complex prompts are understood.

Latent diffusion

Rather than performing diffusion in pixel space (computationally expensive), latent diffusion models work in the compressed latent space of a variational autoencoder (VAE). Images are encoded into a lower-dimensional latent representation before diffusion; the generated latent is decoded back to pixels by the VAE decoder. This dramatically reduces the computational cost of generation.

Foundational reference

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). “High-Resolution Image Synthesis with Latent Diffusion Models.” arxiv.org/abs/2112.10752

Radford, A., et al. (2021). “Learning Transferable Visual Models From Natural Language Supervision.” (CLIP) arxiv.org/abs/2103.00020

What makes Midjourney distinctive

Midjourney does not publish its model architecture or training data details. However, several factors are widely credited with its distinctive aesthetic quality:

Curated training data: Midjourney has been selective about training on high-quality artistic and photographic content rather than broad web scrapes
Human feedback loops: Early versions incorporated user upvoting data to steer aesthetic preferences
Proprietary fine-tuning: Extensive fine-tuning to achieve the characteristic “Midjourney look” — cohesive, painterly, and evocative rather than literal
Research investment: Continuous architectural improvements not disclosed publicly

Official source

Midjourney documentation and research: docs.midjourney.com
Midjourney does not publish technical papers about its model architecture. The foundational diffusion model research referenced above represents the general class of models Midjourney is built upon.