Descript lets you edit podcast and audio recordings by editing the text transcript. Studio Sound cleans any audio in one click. Filler word removal eliminates every 'um' automatically. Overdub corrects recording mistakes by typing. The most efficient podcast editing workflow available. From $16/month.
Descript is covered in the AI for Video section as a full video and podcast editor. This guide focuses on its audio-specific capabilities: Studio Sound (AI audio cleanup), filler word removal, multi-track podcast editing, overdub voice cloning, and transcript-based audio editing — for podcasters and audio producers who may not be producing video content.
The core workflow is identical to video editing in Descript: you upload your audio recording, it transcribes automatically, and you edit by editing the text. Delete a sentence from the transcript and that audio is removed. Reorder paragraphs and the audio reorders. For interview-based podcasts and spoken-word audio content, this is dramatically faster than traditional waveform editing.
Studio Sound — One click removes background noise, echo, room reverb and improves voice quality from any recording. Turns a recording made on a laptop microphone in a noisy room into something approaching studio quality. Descript's Studio Sound is rated by independent reviewers as particularly effective for non-native English speakers and heavy accents — better than Adobe Podcast Enhance in these cases.
Filler word removal — Detects and removes every "um", "uh", "like", "you know" and long pause from your recording in one click. For a 60-minute podcast interview, this step alone saves 30–45 minutes of manual editing.
Multi-track editing — Import separate audio tracks for host and guest (recommended for quality podcasting). Descript shows both transcripts simultaneously. Edit across both tracks by editing the combined transcript. Speaker labels assigned automatically.
Overdub — Train Descript on 10 minutes of your voice. Correct any mispronounced word, change a sentence, or fix a mistake by typing the corrected text. Descript generates your voice saying it and inserts it seamlessly. No re-recording needed for small corrections.
Transcript-based editing — The same text-editing workflow that makes Descript powerful for video applies fully to audio-only projects. Edit the words, the audio follows. Rearrange, cut, restructure — all from the transcript view.
Podcasters producing interview-format or conversation-based shows. Anyone editing recorded interviews, meetings, webinar audio, or lectures for distribution. Content producers who need transcripts alongside their audio for show notes, accessibility or repurposing.
Download Descript from descript.com or use the web version. Create a free account (60 minutes of media per month on free tier). Create a new project. Drag in your audio file. Transcription runs automatically — takes about 1 minute per 10 minutes of audio. When complete, the transcript appears alongside the audio waveform. Begin editing by reading and modifying the text.
Edit rough first, polish second. Do all structural cuts (removing whole sections, rearranging) before applying filler word removal and Studio Sound. This way you are not polishing audio that will be cut anyway.
Always review filler word removal. Before clicking Remove all, review the highlighted words. Filler word detection is accurate for standard English but occasionally flags intentional emphasis or non-standard speech patterns. Review takes 2 minutes and prevents accidental removal of meaningful speech.
Descript is covered in full in the AI for Video section. This guide covers the audio-specific workflow. Studio Sound uses deep learning models trained on paired clean and noisy audio recordings. It separates the speech signal from background noise and room reverb, then applies voice enhancement. Per independent testing, Descript's Studio Sound performs comparably to or better than Adobe Podcast Enhance, particularly for non-native English speakers and recordings with complex noise profiles.