Audio Analysis, Tools of the Trade

I’ve been reading a little bit about audio analysis and doing a lot of searching for audio libraries. There are two audio analysis methods I want to compute.

1. Compute the loudness of an audio file. That is add up the PCM output and divide by the length in seconds of the audio file. This outputs the average loudness of the file. It’s is an easy measure to compute and might be useful for categorizing music (I haven’t yet tried sorting my music by this “loudness” measure).

2. Do a spectral analysis of the audio file. That is compute the discrete (fast) fourier transform of the file and go from there. As the wikipedia article shows the discrete fourier transform requires significant analysis to get good spectral components for an audio file (see the Spectral Analysis section). For audio analysis there is also the Discrete Time Fourier Transform which is just a variant of the discrete fourier transform.

As of right now I’ve written the code to do 1. but I don’t have anything for 2.. Before I get too excited about doing the spectral analysis of audio files I need to read Example Applications DFT. And also brush up on my fourier analysis.

For implementing the spectral analysis I’m going to use the FFTW library developed at MIT.

I’ve listed below the packages and libraries for audio analysis and fourier analysis that I came across in my search.
SciPy (main python scientific platform (fourier analysis uses the FFTW library), CLAM (large audio analysis framework, lots of dependencies), Baudline Spectrum Analyzer, Matlab (no explanation needed), Octave (Matlab clone), FFTW.

I’m going to use the FFTW library for my program because the other libraries such as scipy etc. require 10’s of dependencies. And I want my program to rely on only a few dependencies. If writing C code gets too onerous I’ll switch to SciPy.


Musical Distance

As of most of you, I play my music through the computer. One of the popular settings for music players is the “shuffle” or “random” mode. That is the next song is randomly (or semi-randomly) selected from the playlist. This was a great innovation for listening to overlooked song in your music collection.

The only problem is that the “shuffle” mode mixes clashing genres that is different genres are played next to each other. This is avoidable by filtering your playlist to include only a specific genre (say rock) and then use the “shuffle” mode to play only rock songs. But for me that’s only a partial solution. Because when the music player lists only rock songs it includes songs by “Rammstein” (heavy metal) and songs by “Jonathon Coulton” (folk rock). In particular I consider Rammstein “hard” music and “Jonathon Coulton” soft music.

I think “Rammstein” and “Jonathon Coulton” are musically far apart. Not as far apart as Classical Music and Rap. But not as close as Classical Music and Jazz. The idea of music genres quantifies the idea of musical closeness to some degree. But I want a quantitative measurement for how close two songs are musically. Similar to how we say that a song is “soft” or “loud/hard”.

The idea of musical distance is that if two songs are musically close together then their musical distance is a small number. Conversely if two songs are musically far apart then their musical distance is large. That is I want a metric (in the mathematical sense) for songs.

The only trouble is I know nothing about audio analysis. I know what a spectrum is but that’s about it. Julius O. Smith III at Stanford has been kind enough to post online his textbooks on the topic of music/audio processing and analysis at Mathematics of the Discrete Fourier Transform with Audio Applications, Introduction to Digital Filter with Audio Applications, and Physical Audio Signal Processing
for Virtual Musical Instruments and Audio Effects
. When I have a couple of weeks to spare (hah) I’ll try giving the books a read to see what I can come up with.