Chord Detection with Fast Fourier Transform¶

Intro¶

Modern computer vision programs can track fingers in the 2D plane easily easily, but struggle to track depth. This is a problem since the depth of the finger drastically changes what chord is actually being played.

When a computer vision model tracks a finger to be overlapping with a string, there are three possibilities:

The finger is pressing down on the string, or "fretting" the string. This results in the note being played.
The finger is pressing lightly on the string, or "muting" the string. This results in the note being inaudible.
The finger is hovering over the string. This results in the "open string" being played, which is usually a different note entirely (one of E, A, D, G, or B).

Since computer vision algorithms currently struggle to distinguish between these three possibilities, we present an approach where the audio of the chord is analyzed to determine the actual chord being played.

Inputs¶

This notebook takes in 2 inputs, the first of which is the hypothetical input of what a computer vision algorithm sees. This input is in the form [n, n, n, n, n, n], where n can be X to represent a muted note, 0 to represent the open note, or any integer 0 < n < 20 to represent a fretted note.

A list of candidate chords is then created from this input, where each n from 0 < n < 20 is also replaced with either 0 or X.

The second input, also in the form [n, n, n, n, n, n], represents the "actual" chord being played. An audio of this chord is created using the Karplus-Strong algorithm. The audio is matched with the generated audio of each chord in the candidate chord list, and the most likely candidate chord is then identified.

Audio Analysis¶

To identify which candidate matches the played sound, we perform a Fast Fourier Transform (FFT) on each synthesized audio clip. The FFT converts each time-domain waveform into the frequency domain, revealing the harmonic spectrum of the chord. Each chord has a unique combination of dominant frequencies corresponding to its notes.

By comparing the FFT spectra of the actual chord and each candidate chord, the algorithm identifies which candidate�s frequency profile most closely matches the real sound.

Using this notebook¶

All inputs are defined in Cell 19.

Modify each n in test_audio = synth_chord_array([n, n, n, n, n, n], dur=2.2, sr=SR) to form the test chord, or the chord that is being heard.

Modify each n in candidates = enumerate_possible_chords([n, n, n, n, n, n]) to form the input "chord", aka what the CV algorithm detects.

The default setup is an E minor chord being played, but a finger is also detected overalapping with the first fret G string. This is a good real-world example of the applicability of this approach because:

The depth of the finger mentioned here makes a big difference in the chord being played.
- If this finger was fretting that note, this would form an E major chord.
- If this finger was hovering over that note, this would form an E minor chord.
- If that finger was muting that note, this would form an E5 chord.
Transitionaing from a major chord to a minor chord of the same root is quite common in music (see Creep by Radiohead for an example in key C major).
This specific scenario is common and not an unlikely edge case. Guitarists transitioning from an E major to E minor chord will simply lift their pointer finger (first fret G string) slightly and keep their other fingers where they are.

Using the notebook¶