Creating (and testing) a "prepared voice" patch in SuperCollider

Once I settled on the “prepared voice” concept for my SoundMakers commission, I had to figure out how to make it a musical reality. I had tested a few simple delay lines, but they weren’t great: I was either getting mechanical-sounding echoes or washy reverb. I needed something that blended more completely with the acoustic component of the voice.

To be able to fine tune my delay patch however, I first needed some kind of vocal synth. After all it would be hard to figure out if I was going in the right direction if I didn’t have a voice-like sound to plug into it. I could tune the patch to a certain extent using my own singing voice, but that wasn’t going to tell me if the patch works in a polyphonic setting—I needed either four singers at my beck and call, or a synthesizer patch with the appropriate acoustic characteristics.

In the past I’ve controlled SuperCollider from Sibelius using MIDI signals, so I set out to do the same here: I would build a vocal synth in SuperCollider that I could feed into my “prepared voice” patch, as triggered by Sibelius, and that would allow me to hear the combined effect of polyphonic vocal phrases with delay.

However, in order for this vocal synth to be of any use, it had to have basic timbral similarities to the voice. What I didn’t want was one of those awful choral “ooh” or “aah” patches you find on most keyboards. Those aren’t very useful in replicating vocal phrases, because they’re too washy, they don’t articulate attacks, and they’re too one-sided: all you get is that soft, textual singing you’d expect in the background of a TV Christmas special.

Solo singing is an entirely different beast, full of sharp consonants, noisy sounds, shifting vowels, and a huge range of timbral and expressive variation. Obviously I wouldn’t be able to get all of that, but I needed a sound source that captures the most important bits. Step 1: Maybe a clarinet? I’ve often found clarinet tones to be useful in simulating the voice for compositional purposes. They have a similar richness of tone, they can sustain, and they’re aren’t too descript: listening to a synth clarinet doesn’t colour the underlying musical materials as much as something with a stronger timbral identity, like a bassoon or those nasty vocal ooh patches. Synth clarinet, in other words, is sort of a musical wallflower (in a good way).

I thought to myself: you could pull out a textbook on acoustics and construct the clarinet from scratch using mathematical models—or you could just mooch off of someone else’s work. A quick Google search turned up a simple clarinet patch, and I was playing with the following model in under five minutes:

SynthDef(\clarinet, {
  arg freq=440, vel=0, out_bus=0, gate=1; var signal, env, vel_scale;
  vel_scale = (vel ** 2)/(126 ** 2);
  env = EnvGen.kr(
    Env.adsr(0.001, 0.05, 0.5, 0.01, 1, -4),
    gate,
    doneAction:2
  );
  signal = Mix.fill(10,{|i| var harmonicnumber = 2*i+1; //odd harmonics only
  SinOsc.ar(freq*harmonicnumber)/harmonicnumber}) * 0.25;
  signal = signal * env * vel_scale;
  Out.ar(out_bus, signal);
}).send(s);

If you’re not sure what you’re looking at here, don’t worry. The important details are as follows:

vel_scale = velocity scale – A way to convert MIDI velocity into the kind of loud/soft scale that our ears hear, so that I could play dynamics.
env = envelope – The shape of the sound. Here, it basically goes on instantly, has a bit of a spike on the attack, and then settles at half of the maximum loudness until released. The spike is a crude way of replicating the sound of a clarinet attack.
signal = Mix.fill(10… – This takes a sine wave at a given frequency, skips the next harmonic up (as the acoustic clarinet does), and then creates an additional 9 sine waves every other harmonic traveling up the harmonic series. In other words, you get harmonics: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19.

This is a pretty bare-bones model of the clarinet, looking only at frequency and doing a very rudimentary job of replicating the attack sound. I stuck a little reverb on it and this is what it sounded like:

Not exactly inspiring.

Step 2: Maybe it’s the delay line?

I thought that maybe the clarinet would be okay once I found the right delay line. So I set about tweaking the delay, which is based on a high-quality comb filter (don’t worry if you don’t know what that is). I tested out different delay and decay ranges, with the actual values chosen at random each time by the computer. The delay time is how far apart each echo is, and I had SuperCollider pick a random number between 0.234 and 0.534 seconds. The decay time is how long the sound echos before it disappears entirely, and I set the random number between 5.5 and 7.5 seconds. (These weren’t the final values I decided on for the piece, but they seemed right at this point in time.)

In addition, I asked SuperCollider to remember how long each decay time was, then start a new delay line half way through, again picking random numbers. That way, there would always be two, slightly different delay lines running at once, creating a kind of pitter-patter effect instead of a rhythmic pulse. The overlapping would also ensure that whenever a singer sang a note, there would be a delay running to capture it, so we wouldn’t get any naked notes that don’t get echoed. Here’s what it sounded like:

Okay, well, the clarinet patch certainly didn’t get any better, but the delay concept was improving. It was still too washy, but it was a step in the right direction.

Step 3: Balancing the source with the echo

I worked some more on the delay line, using the assumption that the problem was about balance: the washiness was probably because the delay was too quiet compared to the source. If I got them about the same volume, then they would blend and overlap in performance, getting rid of the washy effect. Turns out that wasn’t quite right, however.

The clarinet patch was creating a fairly hot signal, since it was summing a bunch of sine waves all played at a fairly high volume. SuperCollider deals with signal amplitudes between 0 and 1. If you go over 1, you get clipping—which is not always bad given the way SuperCollider handles it, but it drastically changes the timbre and makes the sound super loud. At first, I actually clipped the clarinet on purpose, because the harshness made up for some of the boring aspects of the clarinet patch. But the volume spike used in the attack portion of the sound created a strange artifact once I raised the volume on the delay line. Remember that the clarinet is only at full volume for a split second before dropping down, which is how it creates the volume spike. Most of the time, the clarinet is playing at 50% volume.

Now, because the delay line lasts for a relative long time and is constantly getting quieter, the initial sections are louder and balance with the clarinet while the subsequent sections become a quiet wash of sound. I wanted the sounds to balance for a long as possible, so I had to increase the delay volume so that it was technically louder than the clarinet at the start. Most of the time that was okay, because the spike in volume at the start of the clarinet part would contrast with the higher delay volume that was echoing the non-attack parts of the clarinet at 50% volume. This worked about 90% of the time, but sometimes a new delay would start and the clarinet would play a split second later, when the delay was still louder than the clarinet. The delay would therefore pick up the spike and I’d get strange, random accents coming out of nowhere.

Step 4: Soft-clipping the clarinet

I went back to the clarinet to see if I could fix the problem there. I played with a bunch of hard-wired volume settings, but they weren’t very versatile: one setting would work for loud sounds, another for soft sounds. I couldn’t get one that worked all around.

On a lark, I decided to try soft-clipping the clarinet waveform (I had been hard-clipping it before). The diagram below shows what happens when you soft-clip vs. hard-clip:

how soft clipping works

Notice how the hard clipping (red lines) creates a sharp angular break in the otherwise smooth waveform. The sharpness of the break is what creates the harsh sound of clipping. When you soft clip instead, those harsh breaks get rounded out (orange lines) and they’re much more agreeable to the ears.

I don’t fully understand the physics behind it, but when I switched from a fixed amplitude value (0.25) that created hard clipping to wrapping the clarinet sine wave generators in a soft clipper, the balance problems with the delay line disappeared. Maybe someone with more of a physics background than me can explain why this worked. But in any case, all I changed was the very end of one line of code:

SinOsc.ar(freq*harmonicnumber)/harmonicnumber}) * 0.25;

SinOsc.ar(freq*harmonicnumber)/harmonicnumber}).softclip;

And here’s what it sounded like:

Step 5: Detour into strings land

Okay, so the random accents issue seemed to be solved. I spent some time working on musical materials with the patch, but the boringness of the clarinet sound was still bothering me. I decided to try some alternatives. I downloaded a simple strings patch from the Internet and swapped it in for the clarinet.

It was god awful. I tried playing around with it for awhile, but to no avail. I gave up and went back to clarinet. The best I got from the strings was the following:

Step 6: Adding noise to the clarinet

I thought things might improve if I could add some noisiness to the clarinet signal, something that would make it seem a little less sterile. I tried putting a noise generator into the amplitude slot of my sine wave generators. What that means is that I was randomly varying the loudness of the sine waves, thousands of times per second. It gave them a bit of a rough edge: you’d still mostly hear the sine waves, but you’d also get a bit of fluctuation and noise.

I tried a bunch of different noise generators. Some of them were too intense, pretty much destroying the sine waves and giving me TV static. Others were too subtle to notice. Eventually I decided to go with pink noise, which is a kind of noise with equal power per octave. Since our ears don’t hear octaves equally, that gives the effect of a reduction in volume for each octave. For whatever reason, that sounded good. The clarinet patch still sounded fake, but it had a pleasing richness to it.

Step 7: Rebalancing source with delay

There was one problem with my new pink-noise-laced clarinet: the delay line didn’t sound balanced anymore. It took me a while to figure out what was going on, but I eventually tracked it down. It had to do with the limiters in the patch, which are used to prevent sounds from getting too loud.

All along, I had been using a global limiter on the entire patch. I often do this, just to make sure I don’t ruin things with some loud, random sound. However, the signal was being processed in steps: first I generated the clarinet, then I fed it into the delay (which had its own limiter), then I put it into the global limiter and out the speakers. By the time the original source sound had reached our ears, it had gone through two limiters, and something about the noise I had added to the clarinet signal made that more obvious. Perhaps it was over-limiting the clarinet or something.

I was having trouble balancing the source with the delay, so I decided to try sending the source straight to the final limiter instead of routing it through the delay. That ended up solving the problem. Again, I don’t really know why this worked, but I figured it out through trial and error.

So there you have it. Below you’ll hear the final, delay-infused clarinet sound in all its glory. I had built a patch that provided the delay effect I wanted and created a synthesized sound that was rich enough to simulate the behavior of the voice, at least for compositional purposes. Now I only had to write the music!

This article originally appeared on the SoundMakers composer-in-residence blog.