AI for Video

Descript — The Complete Guide

Descript lets you edit video and podcast recordings by editing the text transcript — delete words, delete footage. Studio Sound cleans any audio in one click. Filler word removal eliminates every 'um' and 'uh' automatically. Used by over 6 million creators. Free tier available. Creator plan: $24 per month.

AI Video & Audio EditorText-based editingFree tier availableCreator: $24/monthLast reviewed: April 2026

What is Descript?

Descript is a video and audio editor with a completely different approach to editing: you edit the transcript, not a timeline. When you import a recording into Descript, it transcribes everything automatically. The text appears next to your footage. To cut a section, you delete the words from the transcript. To rearrange the order, you cut and paste paragraphs. The video follows whatever you do to the text.

For anyone who edits spoken-word content — podcasts, interviews, tutorials, webinars, talking-head videos — this is dramatically faster than traditional timeline editing. Scrubbing through footage looking for the right moment to cut, then precision-trimming frame by frame, is replaced by reading a document and deleting the parts you do not want. Descript estimates this reduces editing time for spoken-word content by 60–70%.

Beyond text editing, Descript includes over 30 AI tools: Studio Sound (one-click professional audio), filler word removal (removes all "ums" and "uhs" automatically), Eye Contact correction (makes it look like you were looking at the camera when you were reading from a script), Green Screen without a green screen, AI voice cloning (Overdub), translation and dubbing in 30+ languages, and Underlord — an AI co-editor that can execute editing tasks from natural language instructions.

Descript is used by over 6 million creators including podcasters, YouTubers, course creators, corporate video teams, and marketers. It runs on macOS and Windows with a web version available.

What Descript actually does

Text-based editing — Transcribes your recording, shows the text alongside the video. Delete words to delete footage. Rearrange paragraphs to rearrange video. Move a sentence from minute 12 to minute 3 in seconds. This is the core feature everything else builds on.

Studio Sound — One click removes background noise, echo and room sound from any recording. Turns a recording made in a noisy home office into something that sounds studio-recorded. No microphone upgrade required.

Filler word removal — Detects and removes every "um", "uh", "like", "you know" and long pause from your recording in one click. For a 30-minute interview this alone saves 30–60 minutes of manual editing.

Eye Contact correction — AI subtly adjusts your gaze so it appears you were looking directly at the camera, even if you were reading from a script, teleprompter or second screen.

Overdub / AI voice cloning — Train Descript on 10 minutes of your voice. Then regenerate any misspoken word or deleted section by typing the corrected text. Descript generates your voice saying it. Useful for fixing mistakes without re-recording.

Underlord AI co-editor — An AI that accepts plain language instructions and executes editing tasks. "Create a 60-second highlight reel from the best moments." "Generate chapter markers." "Write social media captions for this video." Available on Creator and above.

Who Descript is for

Descript is best for creators whose content is primarily spoken: podcasters, YouTubers doing talking-head or interview content, course creators, marketers producing video tutorials, and corporate teams making webinars and training videos.

It is not designed for cinematic video production, music video editing, or anything that requires precise frame-level visual editing. A film editor or professional video producer working with complex multi-camera footage will find traditional tools like Adobe Premiere or DaVinci Resolve better suited. Descript fills the gap for the much larger population of people who produce spoken-word content and currently spend hours on tasks that should take minutes.

Why not just use traditional video editing software?

Traditional video editors like Premiere Pro, iMovie and DaVinci Resolve show you a timeline: a horizontal track where you drag and drop clips, scrub to find the right frame, and cut precisely. Learning this is genuinely difficult. Editing a one-hour recording takes hours even for experienced users.

Descript asks: if the content is mostly talking, why not just edit the words? The transcript is a more intuitive representation of spoken content than a waveform. Anyone who can edit a Word document can edit a recording in Descript.

Is Descript free?

Yes. The Free plan includes 60 minutes of media per month, 100 one-time AI credits, and 720p export with a watermark. This is enough to test the text-based editing workflow with real recordings. The Hobbyist plan is $16 per month billed annually ($24 monthly) for 10 hours of media and 400 AI credits with 1080p watermark-free export. Creator is $24 per month annually ($35 monthly) for 30 hours, 800 AI credits, 4K export and full Underlord AI access. Business is $50 per month annually for teams.

Getting started with Descript

Step 1 — Download Descript or use the web version

Go to descript.com and download the desktop app (macOS or Windows) or use the web editor. Create a free account. The free plan gives you 60 minutes of media to test with — enough to try the workflow on a real recording.

Step 2 — Import your recording

Create a new project and drag in your audio or video file. Descript transcribes it automatically in 25 languages. This takes roughly 1 minute per 10 minutes of content. While waiting, you can see the transcription appearing in real time.

Step 3 — Edit by editing the text

When the transcript is ready, read through it as you would a document. Select and delete any section you want to cut — the corresponding video and audio is removed simultaneously. Use Cmd+Z (Ctrl+Z) to undo anything. Try this with a 2-minute ramble at the start of a recording — highlight the text, delete it, and watch the video jump straight to where you want it to start.

Step 4 — Remove filler words

In the top menu go to Actions → Remove filler words. Descript scans the transcript and highlights every "um", "uh", "like", "you know" and long pause. Review the highlights if you want — or just click Remove all. For a 30-minute podcast this step alone transforms the pacing of the audio without any manual work.

Step 5 — Apply Studio Sound

Select all your audio (Cmd+A), then go to Effects → Studio Sound. One click. Listen to the result. If you recorded somewhere less than ideal, this single step often makes the audio acceptable. If you recorded in a good environment, it makes it excellent.

Step 6 — Export

Go to File → Export. Choose your format (MP4, MP3, WAV), quality setting and destination. For social media, use the Publish to YouTube or Publish to social options which export and upload directly. The watermark appears only on the Free plan — remove it by upgrading to Hobbyist or above.

17 things to do with Descript

Podcasting and audio

Edit a podcast interview
Import the raw recording. Remove filler words in one click. Delete the first 3 minutes where you were setting up (just highlight and delete the transcript text). Rearrange the best 2-3 story sections to come earlier in the episode — cut and paste the text paragraphs. Apply Studio Sound. Export as MP3. What would have taken 3 hours of timeline editing takes 20 minutes in Descript.
Create podcast chapter markers
After editing, open Underlord and type: 'Create chapter markers for this episode with timestamps and a 1-sentence description of each chapter. Format as: [timestamp] - [chapter title] - [description].' Descript reads the full transcript and generates accurate chapter markers you can paste into your podcast host.
Generate show notes
In Underlord, type: 'Write show notes for this episode. Include: a 3-sentence episode summary, 5 key takeaways as bullet points, links mentioned in the episode, and a quote from the guest. Format for posting on a podcast website.' Export the result and edit as needed.
Transcription for accessibility
Descript transcribes in 25 languages. Export the transcript as a .txt or .docx file for your website or show notes. For YouTube, export as an SRT file and upload as subtitles — this improves both accessibility and search ranking. The transcript is available for any exported project.

YouTube and social video

Edit a talking-head YouTube video
Import your recording. Remove filler words. Read through the transcript and delete: the first take of anything you said better a second time (just delete the first attempt in the transcript), long pauses between points, off-topic tangents. For a 30-minute raw recording, a 12-minute final video typically takes 15-20 minutes of transcript editing in Descript.
Create social clips from a long video
In Underlord, type: 'Find the 5 most quotable or shareable moments in this video. For each one, identify the start and end text in the transcript, the speaker, and why it would work well as a standalone clip.' Use the identified sections to create separate exports for each clip. Add captions (Descript adds animated captions automatically) and export square for Instagram or vertical for Reels/TikTok.
Add animated captions
Select the text you want captioned or use Cmd+A for the whole video. Go to Captions → Add captions. Descript generates animated captions that sync to the speech automatically. Customise the font, colour and position. Export with captions burned in. This replaces a manual captioning step that could take as long as the edit itself.
Fix eye contact for teleprompter reading
If you read your script or notes on a second screen rather than looking at the camera: select your clips, go to Effects → Eye Contact, and apply. Descript subtly adjusts your gaze in post so you appear to be making eye contact with the viewer. Useful for tutorial presenters who want to read detailed technical content without memorising it.

Professional and corporate video

Edit a webinar recording
Import the raw webinar recording. Remove filler words. Identify the sections where the presenter lost track or repeated themselves (easy to spot in the transcript). Cut them. Remove the first 5 minutes of 'we'll give everyone another minute to join'. Export at 1080p. A 90-minute raw webinar becomes a sharp 60-minute on-demand resource in under an hour.
Create training video content
Record yourself walking through a process on screen. Import to Descript. Use the transcript to cut any stumbles or restarts (just delete the first attempt at each step). Add Studio Sound to clean the audio. Use Underlord to generate a chapter list for the training video. Export and upload to your LMS or internal wiki.
Translate and dub a video
Import your English video. In the Translation panel, select the target language. Descript transcribes, translates and generates a dubbed audio track using a voice trained to sound like a native speaker in that language. Available in 14 languages with custom-trained AI voices. Review the translated script for accuracy, then export the dubbed version.
Multi-track interview editing
For a podcast recorded with separate tracks for host and guest (recommended for quality): import both tracks as a multi-track session. Descript shows both transcripts simultaneously. Edit across both tracks by editing the combined transcript. Speaker labels are assigned automatically. Export as a mixed stereo file or keep tracks separate.

AI voice and regeneration

Fix a recording mistake without re-recording
If you said the wrong number, mispronounced a name, or want to change a phrase after recording: click on the word in the transcript. Type the corrected word or phrase. If you have trained your Overdub voice: Descript generates your voice saying the corrected text and replaces the original seamlessly. Available on Creator and Business plans.
Remove background noise from old recordings
Import any old recording — even one with significant background noise. Select all (Cmd+A). Apply Studio Sound. Descript's AI separates speech from background noise and enhances the voice. For recordings that were previously unusable (noisy cafe, HVAC hum, echo in a hard-floored room), this often rescues content you thought was lost.

Tips for getting the most from Descript

Always review filler word removal before accepting. Descript's filler word detection is accurate but not perfect. Occasionally it flags words that are intentional emphasis rather than filler. Review the highlighted words before clicking Remove all — especially for non-native English speakers or regional speech patterns.

Edit rough first, polish second. Do all your big structural cuts first — removing whole sections, rearranging the order. Then do filler word removal and Studio Sound. Then do precise line-level edits. This order is faster because you are not polishing sections that might get cut anyway.

Use Compositions for social clips. Descript's Composition feature lets you create separate edit versions from the same source media. Use the main project for your full-length version and create Compositions for social clips, trailers and highlights — all from the same source, without duplicating files.

Export to Premiere or Final Cut if you need advanced visual effects. Descript exports full timelines to Adobe Premiere Pro, Final Cut Pro and DaVinci Resolve. Use Descript for the text editing and rough cut, then finish in your professional editor of choice if the project needs complex visual effects, colour grading or multi-camera editing.

Technical background

Descript was founded in 2017 by Andrew Mason, co-founder of Groupon, and is headquartered in San Francisco. It is used by over 6 million creators and is SOC 2 Type II certified, per Descript's official website. The platform combines proprietary transcription technology, licensed AI models for audio enhancement, and computer vision for Eye Contact correction and Green Screen removal.

Transcription accuracy

Descript's automatic transcription supports 25 languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, Hindi and others, per the official Descript pricing page. Transcription accuracy is high for clear speech with minimal background noise. Technical vocabulary, proper nouns and heavy accents reduce accuracy. The editor makes corrections easy — clicking any word in the transcript and typing the correction takes seconds.

Studio Sound technology

Studio Sound uses deep learning models trained on paired clean and noisy audio recordings. It separates the speech signal from background noise, echo and room reverb, then applies voice enhancement. According to independent testing, it performs comparably to dedicated audio cleanup tools for typical podcast and video recording environments.

AI credits system

Descript introduced an AI credits system in 2025 that applies to AI-powered features: Underlord AI co-editor, Eye Contact, Green Screen, Studio Sound, and AI voice features. The Free plan includes 100 one-time credits. Hobbyist includes 400 per month. Creator includes 800 per month. Complex AI operations consume more credits than simple ones. Per Descript's official pricing page, media hours and AI credits are tracked separately.

Pricing (verified April 2026)

  • Free: 60 media minutes/month, 100 one-time AI credits, 720p with watermark
  • Hobbyist: $16/month (annual) or $24/month — 10 hours media, 400 AI credits, 1080p watermark-free
  • Creator: $24/month (annual) or $35/month — 30 hours media, 800 AI credits, 4K, full Underlord
  • Business: $50/month (annual) — 40 hours media, team collaboration, Brand Studio, priority support
  • Enterprise: Custom — advanced security, custom invoicing, dedicated support
Primary sources cited in this guide