encryptio.com

This sentence is very meta.

Impulse Response Extraction

One day, I got particularly interested in convolution reverb. Basically, the idea is that you somehow get a recording of what the room sounds like when you play an extremely short pop in it (that is, its impulse response.) Then, you take that impulse response and convolve it with the sound you want, and you get something extremely similar to what you would hear if you were actually there, playing the sound in the room.

Initial Method

Unfortunately, the mathematics around extracting good quality impulse responses are difficult at best, so I started by using a very stupid method - I played a bunch of impulses and wrote programs to average all the recordings I got. This worked, but it was extremely low quality for the time it took. The details are not worth mentioning.

Exponential Sine Sweep

The best known method for getting an impulse response is to play a specific sound, an exponential sine sweep, record the result of playing it, and "deconvolve" the sine sweep recording back into an impulse response. This is done by dividing in the frequency domain, or equivalently, convolving (forward) with the reversed sine sweep that was played.

More intelligibly for the less signal-processing minded, you simply take each frequency and shift it in time so that it lines up with all the others. Some spectrums:

I play this sound: (its supposed to be a single line, but unfortunately, the methods that I use to create this sound are not exact)

And in the room, I get this sound from the microphone: (This is a very noisy recording, you can probably do better without too much trouble. Note: the horizontal lines are part of the room noise, as well as the lower portions continuous warmth. The you can also see some lines at precisely integer multiples of the main frequency - this is nonlinear distortion caused by the speakers being used slightly past their good response volume. This isnt much of an issue, as we see later.)

Then deconvolve it, and get this sound: (Note: The main sweep has been moved into a single spike in the middle, and the nonlinear distortions have all been moved into spikes before the main impulse, thus, we can easily cut them away. Also note that the general level of noise has decreased, due to the requirement that we lower the volume so that the impulse doesnt clip like mad.)

Actually doing it

I wrote my own software for this - I used ininit for creating the sine sweep, and my convolution program for convolution/deconvolution.

First, I created the sine sweep - here, T = 45 for a 45-second "stretched impulse."

shell$ cat sweep.lua
f1 = 20
f2 = 20000
T = 45

b = f2/f1
a = f1
freq = makefn(100, function () return b^(gettime()/T) * a end)
sweep = osc_sine(0.25, freq)

saver(sweep, "sweep.au")
run(T)
shell$ ininit sweep.lua

Then, I played sweep.au with Audacity and recorded the result, then did the same with a slightly different microphone placement for stereo seperation. To create the impulse responses out of this, I first saved the recorded signals to recorded-left.wav and recorded-right.wav, and then reversed the sweep and saved it as revplayed.wav. Then, I deconvolved it:

shell$ convolute recorded-left.wav revplayed.wav impulse-uncut-left.wav 0.00003
Reading 2426880 samples from recorded-left.wav...
Reading 1984500 samples from revplayed.wav...
fftlen is 4194304
doing 2 steps of size 2209794
maximum amplitude: 0.847453

Similar for the right channel. I then opened up audacity, cut impulse-uncut-left.wav and impulse-uncut-right.wav to just the impulse starting 5 milliseconds into the file and fading away after a few seconds. This was my final file. I saved this, and now I can apply it to any file I want to with convolute input.wav impulse-left.wav 0.01.

I then applied my system equalization response (see below) to the final impulse files to get rid of the microphone and speaker responses, leaving me with only the room impulse response.

Microphone and Speaker Response Removal

The impulse response you get by doing this is not just the response of the room, but the response of the particular setup you had at the time you recorded it. Most notably this includes your speakers, microphone, their relative placement, and any physical coupling between them. If you record it with crappy speakers and a crappy microphone, your impulse will have the equalization curves of both of them combined.

You can combat this by recording your particular pair of speaker(s) and microphone(s) in an anechoic chamber or an approximation (for example, a wide open outdoor area) and inverting the response in the frequency domain around the impulse. I wrote mkunfilter.pl specifically for this purpose. With this file, you can convolve your normal room impulse responses with it to get rid of most of the speaker and microphone equalization, leaving you with, nominally, just the rooms response.