josbach.dev

SilentR is a professional-grade web application designed to streamline the post-production workflow for podcasters and audio engineers. By automating the tedious process of silence removal, it creates "jump-cut" style edits across hours of audio in minutes. Unlike simple tools, SilentR offers a cinematic, non-linear editor experience right in the browser.

Core Capabilities

Smart Silence Detection: Utilizes a custom algorithm to analyze the audio noise floor and detect silence with user-configurable thresholds (dB) and duration (ms).
Cinematic Visualizer: A high-performance waveform viewer that supports zooming down to the sample level and smooth scrolling, optimized for long recordings.
Non-Destructive Editing: All cuts are virtual. Users can adjust "Keep" and "Remove" regions endlessly before finalizing the export.
Batch Processing: A manifest-based system allows users to upload a JSON file containing metadata for hundreds of tracks, which are then processed in parallel workers.
Multi-Format/Multi-Track: Support for stereo, mono, and multi-channel audio files across MP3, WAV, AAC, and OGG formats.

Technical Deep Dive

SilentR pushes the boundaries of what is possible in a web browser, leveraging modern Web APIs to keep data local and secure.

Browser-Based Audio Processing

To ensure privacy and avoid massive server bandwidth costs, SilentR processes audio entirely on the client side using WebAssembly (WASM).

FFmpeg.wasm: We compiled the legendary FFmpeg library to WASM to handle file decoding and encoding. This allows the app to support virtually any audio format without sending a single byte to a server.
AudioWorklet: Use of the AudioWorklet API allows for glitch-free audio processing on a separate thread from the main UI, ensuring the interface remains responsive even during heavy computation.

Waveform Rendering Engine

Rendering a waveform for a 2-hour audio file is memory-intensive.

Canvas Optimization: We use an offscreen canvas technique to pre-render the waveform in chunks. As the user scrolls, we only draw the visible chunks.
Peak Decimation: Instead of rendering every sample, we calculate min/max peaks for different zoom levels (LOD - Level of Details), reducing the dataset size by 99% for zoomed-out views while maintaining visual accuracy.

Application Architecture

State Management: Built with Zustand to manage the complex state of the editor (cursor position, selection regions, zoom level, history stack).
SPA Performance: The app is a Single Page Application (SPA) optimized with Vite. We use aggressive code splitting to load the heavy audio processing modules only when a file is actually imported.

Technology Stack

Frontend: React, TypeScript, Vite, TailwindCSS
Audio Core: Web Audio API, AudioWorklets, FFmpeg.wasm
Visualization: HTML5 Canvas, React Konva (for overlay editors)
State: Zustand, Immer
Deployment: Vercel (Edge Network)

Challenges & Solutions

Challenge: Memory limits in the browser (Tab crash). Solution: Browsers cap memory usage per tab (often ~2GB). Attempting to load a decoded 3-hour WAV file into memory as a Float32Array can crash the tab. We implemented a "Streaming Decode" approach where we process the file in 30-second buffers, calculate the silence regions, and then discard the raw PCM data, storing only the metadata. We only re-decode specific chunks when the user hits "Play".

Challenge: Synchronization between Visuals and Audio. Solution: The AudioContext.currentTime is accurate but updates asynchronously from the requestAnimationFrame loop. We created a custom hook useAudioSync that interpolates the playhead position between frames to ensure silky-smooth cursor movement that perfectly aligns with the sound, essential for precise editing.