Descript lets you edit video and podcast recordings by editing the text transcript — delete words, delete footage. Studio Sound cleans any audio in one click. Filler word removal eliminates every 'um' and 'uh' automatically. Used by over 6 million creators. Free tier available. Creator plan: $24 per month.
Descript is a video and audio editor with a completely different approach to editing: you edit the transcript, not a timeline. When you import a recording into Descript, it transcribes everything automatically. The text appears next to your footage. To cut a section, you delete the words from the transcript. To rearrange the order, you cut and paste paragraphs. The video follows whatever you do to the text.
For anyone who edits spoken-word content — podcasts, interviews, tutorials, webinars, talking-head videos — this is dramatically faster than traditional timeline editing. Scrubbing through footage looking for the right moment to cut, then precision-trimming frame by frame, is replaced by reading a document and deleting the parts you do not want. Descript estimates this reduces editing time for spoken-word content by 60–70%.
Beyond text editing, Descript includes over 30 AI tools: Studio Sound (one-click professional audio), filler word removal (removes all "ums" and "uhs" automatically), Eye Contact correction (makes it look like you were looking at the camera when you were reading from a script), Green Screen without a green screen, AI voice cloning (Overdub), translation and dubbing in 30+ languages, and Underlord — an AI co-editor that can execute editing tasks from natural language instructions.
Descript is used by over 6 million creators including podcasters, YouTubers, course creators, corporate video teams, and marketers. It runs on macOS and Windows with a web version available.
Text-based editing — Transcribes your recording, shows the text alongside the video. Delete words to delete footage. Rearrange paragraphs to rearrange video. Move a sentence from minute 12 to minute 3 in seconds. This is the core feature everything else builds on.
Studio Sound — One click removes background noise, echo and room sound from any recording. Turns a recording made in a noisy home office into something that sounds studio-recorded. No microphone upgrade required.
Filler word removal — Detects and removes every "um", "uh", "like", "you know" and long pause from your recording in one click. For a 30-minute interview this alone saves 30–60 minutes of manual editing.
Eye Contact correction — AI subtly adjusts your gaze so it appears you were looking directly at the camera, even if you were reading from a script, teleprompter or second screen.
Overdub / AI voice cloning — Train Descript on 10 minutes of your voice. Then regenerate any misspoken word or deleted section by typing the corrected text. Descript generates your voice saying it. Useful for fixing mistakes without re-recording.
Underlord AI co-editor — An AI that accepts plain language instructions and executes editing tasks. "Create a 60-second highlight reel from the best moments." "Generate chapter markers." "Write social media captions for this video." Available on Creator and above.
Descript is best for creators whose content is primarily spoken: podcasters, YouTubers doing talking-head or interview content, course creators, marketers producing video tutorials, and corporate teams making webinars and training videos.
It is not designed for cinematic video production, music video editing, or anything that requires precise frame-level visual editing. A film editor or professional video producer working with complex multi-camera footage will find traditional tools like Adobe Premiere or DaVinci Resolve better suited. Descript fills the gap for the much larger population of people who produce spoken-word content and currently spend hours on tasks that should take minutes.
Traditional video editors like Premiere Pro, iMovie and DaVinci Resolve show you a timeline: a horizontal track where you drag and drop clips, scrub to find the right frame, and cut precisely. Learning this is genuinely difficult. Editing a one-hour recording takes hours even for experienced users.
Descript asks: if the content is mostly talking, why not just edit the words? The transcript is a more intuitive representation of spoken content than a waveform. Anyone who can edit a Word document can edit a recording in Descript.
Yes. The Free plan includes 60 minutes of media per month, 100 one-time AI credits, and 720p export with a watermark. This is enough to test the text-based editing workflow with real recordings. The Hobbyist plan is $16 per month billed annually ($24 monthly) for 10 hours of media and 400 AI credits with 1080p watermark-free export. Creator is $24 per month annually ($35 monthly) for 30 hours, 800 AI credits, 4K export and full Underlord AI access. Business is $50 per month annually for teams.
Go to descript.com and download the desktop app (macOS or Windows) or use the web editor. Create a free account. The free plan gives you 60 minutes of media to test with — enough to try the workflow on a real recording.
Create a new project and drag in your audio or video file. Descript transcribes it automatically in 25 languages. This takes roughly 1 minute per 10 minutes of content. While waiting, you can see the transcription appearing in real time.
When the transcript is ready, read through it as you would a document. Select and delete any section you want to cut — the corresponding video and audio is removed simultaneously. Use Cmd+Z (Ctrl+Z) to undo anything. Try this with a 2-minute ramble at the start of a recording — highlight the text, delete it, and watch the video jump straight to where you want it to start.
In the top menu go to Actions → Remove filler words. Descript scans the transcript and highlights every "um", "uh", "like", "you know" and long pause. Review the highlights if you want — or just click Remove all. For a 30-minute podcast this step alone transforms the pacing of the audio without any manual work.
Select all your audio (Cmd+A), then go to Effects → Studio Sound. One click. Listen to the result. If you recorded somewhere less than ideal, this single step often makes the audio acceptable. If you recorded in a good environment, it makes it excellent.
Go to File → Export. Choose your format (MP4, MP3, WAV), quality setting and destination. For social media, use the Publish to YouTube or Publish to social options which export and upload directly. The watermark appears only on the Free plan — remove it by upgrading to Hobbyist or above.
Always review filler word removal before accepting. Descript's filler word detection is accurate but not perfect. Occasionally it flags words that are intentional emphasis rather than filler. Review the highlighted words before clicking Remove all — especially for non-native English speakers or regional speech patterns.
Edit rough first, polish second. Do all your big structural cuts first — removing whole sections, rearranging the order. Then do filler word removal and Studio Sound. Then do precise line-level edits. This order is faster because you are not polishing sections that might get cut anyway.
Use Compositions for social clips. Descript's Composition feature lets you create separate edit versions from the same source media. Use the main project for your full-length version and create Compositions for social clips, trailers and highlights — all from the same source, without duplicating files.
Export to Premiere or Final Cut if you need advanced visual effects. Descript exports full timelines to Adobe Premiere Pro, Final Cut Pro and DaVinci Resolve. Use Descript for the text editing and rough cut, then finish in your professional editor of choice if the project needs complex visual effects, colour grading or multi-camera editing.
Descript was founded in 2017 by Andrew Mason, co-founder of Groupon, and is headquartered in San Francisco. It is used by over 6 million creators and is SOC 2 Type II certified, per Descript's official website. The platform combines proprietary transcription technology, licensed AI models for audio enhancement, and computer vision for Eye Contact correction and Green Screen removal.
Descript's automatic transcription supports 25 languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, Hindi and others, per the official Descript pricing page. Transcription accuracy is high for clear speech with minimal background noise. Technical vocabulary, proper nouns and heavy accents reduce accuracy. The editor makes corrections easy — clicking any word in the transcript and typing the correction takes seconds.
Studio Sound uses deep learning models trained on paired clean and noisy audio recordings. It separates the speech signal from background noise, echo and room reverb, then applies voice enhancement. According to independent testing, it performs comparably to dedicated audio cleanup tools for typical podcast and video recording environments.
Descript introduced an AI credits system in 2025 that applies to AI-powered features: Underlord AI co-editor, Eye Contact, Green Screen, Studio Sound, and AI voice features. The Free plan includes 100 one-time credits. Hobbyist includes 400 per month. Creator includes 800 per month. Complex AI operations consume more credits than simple ones. Per Descript's official pricing page, media hours and AI credits are tracked separately.