Fourier transform of a sound file

Learning Scientific Programming with Python (2nd edition)

P6.7.3: Fourier transform of a sound file

Question P6.7.3

The scipy library provides a routine for reading in .wav files as NumPy arrays:

In [x]: from scipy.io import wavfile
In [x]: sample_rate, wav = wavfile.read(<filename>)

For a stereo file, the array wav has shape (n,2) where n is the number of samples.

Use the routines of np.fft to identify the chords present in the sound file chords.wav. Which major chord do they comprise?

The frequencies of musical notes on an equal-tempered scale for which $\mathrm{A_4}=440\;\mathrm{Hz}$ are provided as a dictionary in the file notes.py.

Solution P6.7.3

Many thanks to Evgeny Podnos from the University of Texas at Austin for his contribution to this solution.

The code below takes the Fourier Transform of the provided .wav file and analyses its frequency components to identify the notes present.

from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from notes import notes, major_chords

# Add the root note to the intervals of each major chord and turn into a set
# because this dictionary is more useful to us this way.
for root in major_chords:
    major_chords[root] = set(major_chords[root] + [root])

# Define a cutoff for the intensity in the power spectrum for a
# frequency component to be considered a note.
relative_cutoff = 0.1

sample_rate, wav = wavfile.read("chord.wav")
nsamples = wav.shape[0]
nseconds = nsamples / sample_rate

t = np.arange(0, nseconds, 1 / sample_rate)

# The sound file is in stereo so wav has shape (nsamples, 2).
# Pick one of the channels for the FFT:
spec = np.fft.fft(wav[:, 0])
freq = np.fft.fftfreq(nsamples, 1 / sample_rate)

# The input signal is real so the FFT is Hermitian; look at the positive frequencies only.
spec = 2 / nsamples * np.abs(spec[: nsamples // 2])
freq = freq[: nsamples // 2]

f_min, f_max = 20, 8000
idx = (freq > f_min) & (freq < f_max)
spec = spec[idx]
freq = freq[idx]

threshold = relative_cutoff * np.max(spec)
plt.plot(freq, spec, label="spectrum")
plt.plot([f_min, f_max], [threshold] * 2, "r", label="threshold")
plt.xlabel("Frequency  / Hz")
plt.legend()
plt.show()


def find_notes(freq, spec):
    """Return a dictionary of octave: [notes list] for notes found in spec."""
    # Look at the note's frequency +/- df
    df = 0.02
    found_notes = defaultdict(list)
    for octave in range(9):
        for note, note_freq in notes[octave].items():
            # Get the part of the spectrum near the note frequency.
            idx = (freq > note_freq - df) & (freq < note_freq + df)
            spec_at_note = spec[idx]
            # Is there any intensity in the power spectrum near this note's frequency?
            if np.any(spec_at_note > threshold):
                found_notes[octave].append(note)
    return found_notes


found_notes = find_notes(freq, spec)


def identify_chords(found_notes):
    for octave, notes in found_notes.items():
        if octave not in found_notes:
            continue
        for root, chord_notes in major_chords.items():
            if set(found_notes[octave]) >= chord_notes:
                print(root + "m " + str(octave))


identify_chords(found_notes)

The output identifies the chord as D-major chord:

Dm 4

A plot of the frequency spectrum is also produced, as shown below.

The frequency spectrum of a chord played on a piano.