Sounds

Procedural voxel world with biome-reactive generative music

Screenshot of Sounds showing a procedural voxel world
A procedural voxel landscape. The music playing at this moment is determined entirely by where the player is standing, what biomes surround them, and whether it's day or night.

Walk through a procedurally generated world and everything you hear is composed in real time based on where you are. Step into a temperate forest and an Aeolian lead voice picks up over jazz-inflected pad chords. Cross into the dunes and the scale shifts to Phrygian dominant with a flamenco bass line. Wade into deep ocean and everything slows to a Lydian drone. Stumble into a mushroom biome and you're in half-whole diminished territory over electronic percussion. The music isn't a soundtrack. It's a function of the terrain.

The world is built from simplex noise layered at three octaves. Temperature and moisture noise fields carve the terrain into twenty biomes arranged across five elevation bands: subaquatic (deep ocean, coral reef), lowland (mangrove, bayou, salt flat, estuary), midland (savanna, rain forest, dunes, mushroom, temperate forest, meadow), highland (steppe, volcanic peaks, plateau, alpine meadow), and extreme cold (glacial peaks, ice field, taiga, frozen tundra). Twenty block types give each biome a distinct surface: coral and sand underwater, mud and peat in wetlands, mycelium in mushroom forests, basalt on volcanic peaks. The chunks load in a ring around the player, building and tearing down geometry as you move. It looks like a voxel game, but the point isn't the world. The world is the instrument.

Seven voices

The music engine runs seven independent synthesizer voices, each on its own schedule. A lead generates melodic phrases of three to eight notes at eighth-note resolution, choosing from the current biome's scale with a bias toward the player's look direction: look up and the melody trends higher; look down and it drops. Biome-specific filter frequencies shape the timbre, from 1000 Hz in deep ocean to 4500 Hz in the dunes. A pad plays chord progressions on two-measure cycles using FM synthesis, cycling through triads, sevenths, suspended, and add9 voicings with a 2.5-second attack envelope. A bass runs at sixteenth-note resolution with one of ten biome-assigned strategies: walking, dub, ostinato, bossa nova, afrobeat, blues, flamenco, drone, spectral, or industrial. Percussion synthesizes eight drum sounds from scratch (kick, snare, closed and open hi-hat, tom, rim, cowbell, shaker) at sixteenth-note resolution and layers them into biome-specific kits. An ambient voice mixes brown and pink noise through bandpass filters, adding shimmer near water and rumble underground. An arpeggiator runs at sixteenth-note resolution with five pattern types (up, down, updown, random, converge) and filter frequencies that range from 700 Hz in ice fields to 3000 Hz in dunes. And a ping voice triggers a harmonic sine tone with 3.5-second reverb decay when you click, the only sound the player controls directly.

All seven voices are built with Tone.js. The output chain runs through a reverb (4-second decay, 0.3 wet), a feedback delay (0.3 seconds, 0.25 feedback), and a limiter at -3 dB. Each voice manages its own timing and note selection. The engine doesn't sequence pre-composed bars. It makes decisions every beat.

Twenty biomes, twenty soundscapes

Each biome defines a complete musical profile built on one of eight genre presets (ambient, jazz, rhythmic, Latin, blues, electronic, folk, spectral) and customized with a specific scale, tempo range, bass strategy, percussion kit, arp pattern, and chord progression. The scale system holds 44 scales, from pentatonic major and Dorian through slendro, iwato, and raga Bhairav. Some examples of how these combine:

  • Meadow uses pentatonic major with a folk preset, minimalist chord progressions, and an updown arp pattern.
  • Bayou uses the blues scale with a dub bass strategy, 0.55 swing, and 15% ghost note probability on the percussion kit.
  • Dunes uses Phrygian dominant with a flamenco bass (stressed first beat, stabs on 2 and 4) and a Latin genre preset.
  • Volcanic peaks uses Locrian with an industrial bass strategy (short 60-millisecond hits every two beats), an electronic genre preset, and a tritone-based chord progression.
  • Deep ocean uses Lydian with an ambient preset, a drone bass (one sustained note every 32 steps), and a spectral arp at 800 Hz filter cutoff.
  • Glacial peaks uses whole tone with a spectral genre preset, 10-second reverb decay, and 0.666-second delay.

When the player crosses a biome boundary, these targets don't snap to new values. A parameter smoother applies exponential dampening, with different time constants per parameter: 8 seconds for biome transitions, 4 seconds for elevation, 2 seconds for movement speed, and 1 second for camera pitch. The blending works by sampling biomes in a circular area around the player (radius 32 blocks, step size 8), weighting each sample by distance falloff. The result is a set of biome weights that sum to one, and the music parameters are a weighted mix. You hear the gradient as you walk through it.

Harmony

A harmony engine tracks the current chord, available tensions, avoid notes, and a numeric harmonic tension value that rises and falls over the course of a progression. The engine draws from a library of 33 chord progressions organized by style: ambient modal vamps, jazz ii-V-I cadences, blues shuffles, bossa nova, afrobeat, flamenco Phrygian cadences, post-rock builds, drones, gamelan cyclic patterns, IDM chromatic sequences, and raga-inspired pedal tones. Each biome selects a progression style, and the harmony engine feeds chord tones and tensions to the lead, pad, bass, and arp voices so they stay harmonically coherent even as the biome blend shifts beneath them.

Rhythm and percussion

The percussion voice doesn't use generic rock or electronic beats. Its rhythm library holds sixteen patterns drawn from Son Clave, Habanera, Agbadza, Bembe, Gahu, Cascara, Songo, Cinquillo, Tresillo, and others. On top of the patterns, each biome has its own percussion kit defining instrument layers, groove parameters (swing amount, ghost note probability, fill probability and density), and cycle lengths. The bayou kit runs at 0.55 swing with 15% ghost notes and 15% fill probability. The meadow kit has no swing at all. Volcanic peaks run 30% fill probability with dense kick patterns on every other beat.

Timing and velocity are humanized across all kits. Hits land within 10 milliseconds of the grid and velocities vary by up to 6 dB around the target, so the percussion has a loose, played feel rather than a rigid machine quality. Kits are held for 64 steps before the engine considers switching.

Melody

The lead voice uses a motif engine that generates phrases with one of five contour shapes: rising, falling, arch, valley, or plateau. Once a motif is established, the engine can apply nine transforms to it: repeat, transpose (up or down by two semitones), invert, retrograde, augment (double durations), diminish (halve durations), fragment, embellish, or sequence. Note durations are selected based on the current rhythm density: at high density, half the notes are quarters and a third are eighths; at low density, the distribution shifts toward whole and half notes. The lead's filter frequency is set per biome, ranging from 1000 Hz in deep ocean to 4500 Hz in the dunes, and its note range shifts with elevation (lower register near sea level, higher on peaks).

Day and night

A ten-minute day/night cycle drives a second layer of musical change. The sun follows a piecewise intensity curve: full brightness from 30% to 70% of the cycle, linear ramps during the 0.2-0.3 sunrise and 0.7-0.8 sunset windows, and 10% intensity at night. The sky shifts from blue to orange to deep purple and back, and the fog color tracks 70% horizon, 30% sky top. At night, the music engine increases the reverb wet mix from 0.3 to 0.45 (0.5 underground), darkens filter cutoffs, and reduces the volume of brighter voices. The lead takes a -4 dB penalty during daytime and the arp takes -4 dB at night, so the balance shifts across the cycle. The pad gets a +3 dB night boost. Walking through a forest at night sounds different from the same forest at noon, even though the biome profile is identical.

Sections

The engine cycles through four musical sections: ambient (16 bars), building (12 bars), full (20 bars), and breakdown (8 bars). Each section sets gain targets for the voices. In the ambient section, lead drops to 0.3, bass to 0.1, percussion to 0.1, and arp to 0.2, with pad and ambient carrying the texture. Building raises the lead to 0.7 and percussion to 0.6. Full brings everything to 1.0. Breakdown sits between, with lead at 0.5 and percussion at 0.2. The transitions ramp voice gains over two seconds, avoiding hard cuts. Walking through different biomes creates spatial variation; the section cycle creates temporal variation.

World generation

The terrain uses three simplex noise layers at scales of 0.005, 0.015, and 0.04, combined with weights of 0.6, 0.25, and 0.15. Separate noise fields at different frequencies determine temperature and moisture, which together select one of the twenty biomes. Height transitions between biomes use Hermite smoothstep interpolation on a 16-unit grid to avoid hard edges. Trees grow in six biomes (temperate forest, rain forest, taiga, mangrove, bayou, mushroom) with noise-gated placement thresholds that vary per biome: rain forest places trees above 0.55 noise for dense canopy, while bayou requires 0.80 for sparse dead wood.

The rendering uses Three.js with per-face culling: only block faces adjacent to air or water are drawn. Chunks are 16 by 16 blocks and 256 blocks tall. Three device profiles scale the experience: desktop loads six chunks at a distance with 2048-pixel shadow maps and four chunks per frame; tablet drops to five chunks with 1024-pixel shadows; mobile runs four chunks with shadows disabled and a budget of two chunks per frame.

Controls

On desktop: WASD to move, mouse to look, space to jump or swim up, shift to sprint at 1.8x speed. On mobile: a virtual joystick on the left half of the screen for movement, the right half for look, tap to jump, two-finger touch to ping, and an auto-walk button at the bottom. Click (or two-finger tap) to trigger a ping. There's no objective, no inventory, no enemies. You walk through a world and the world plays music for you. The ping is the one moment of agency in the soundscape: a harmonic sine tone that rings out over whatever the engine is doing, a way to leave a brief mark on the generative output before it moves on without you.