Building an AI Profanity Filter with Vocal Separation
I built an online tool that automatically detects and bleeps profanity in video and audio files. Here's the high-level architecture. The problem Manual profanity censoring takes 45+ minutes for a 1...

Source: DEV Community
I built an online tool that automatically detects and bleeps profanity in video and audio files. Here's the high-level architecture. The problem Manual profanity censoring takes 45+ minutes for a 10-minute video. You have to listen through, find each word, razor the audio, drop a beep effect. For songs, it's nearly impossible without destroying the music. The solution AI speech recognition + neural vocal separation. How it works User uploads a file or pastes a YouTube URL Audio is extracted with FFmpeg AI speech-to-text transcribes the audio (AssemblyAI / Deepgram) Profanity is detected using morphological analysis (lemmatization) Each word is replaced with beep/silence/custom sound via FFmpeg For songs: Demucs AI separates vocals from instruments first Song mode — the hard part Demucs by Meta AI does the heavy lifting — splitting audio into vocal and instrumental tracks. Profanity detection runs only on the vocal track, then the censored vocals are mixed back with the original instrum