Using Formants to Synthesize Vowel Sounds

What Are Formants?

Formants are the characteristic amplitude peaks in the spectrum of resonant sound sources. They result from the excitation of fixed resonant chambers and are the most significant contributors to the timbre of tonal instruments. In speech, they are present below 5000 Hz and are usually "in-harmonic," meaning their frequencies are not integer multiples.

For Example...

Guitars with even the slightest differences in body dimensions sound different because their characteristic resonances (formants) are fixed to different frequencies. For this reason, if you looked at the same performance on two different guitars through separate spectrum analyzers, you would notice some resonant peaks in different positions. Formants are why the timbre of woodwind and brass instruments slightly change depending on the player's valve position. What is more, different parts produce different sets of formants because the valves modify the dimensions of the resonant chamber, in the same way, that using various valves (vocal folds) in our vocal tract produces different vowel sounds. It is a fact that humans would only make one resonant sound, one vowel, without the ability to change the dimensions of their resonant vocal and nasal cavities.

To illustrate, below are graphical analyses from the MIT Press of the location of amplitude peaks over time for some English words. The red areas indicate the areas with the most energy.

Producing vowel sounds requires resonance more than making consonants. For example, compare vowel sounds to a flute and consonants to drums. It is important to remember that a different set of formants characterizes each vowel sound and can be synthetically imposed on a complex sound using several resonant filters. So, look at this graph from the "Subtractive Synthesis Concepts" chapter in Ed Doering's Musical Signal Processing with LabVIEW that nicely lays out the approximate formant frequencies for vowels.

Exercise:

Open a synthesizer that is capable of producing a sawtooth wave.
Send the output into an EQ that allows you to simultaneously use up to 3 resonant peak filters.
Turn the resonance or "Q" to the highest value possible.
Boost frequencies F1, F2, and F3 for some vowel sounds in the chart above.
Adjust the amplitudes of each peak and listen for where the vowel sound jumps out.

Examples:

To conclude, here are two examples of this technique. The unfiltered sawtooth is played first for a moment. In the next bar, the EQ is enabled. Example "ah" as in "hot"

Example "oo" as in "boot"

In order to make vowel sounds using multiple resonant filters, the sound source needs to contain frequency content in the range of the formants you chose to impose. For example, to hear F1 for the vowel "oo," there must be energy in 300Hz, 870Hz, and 2410Hz. A safe source is a sawtooth wave because it contains every harmonic.

References:

Doering. Formant Vowel Synthesis. November 2007.

Huckvale. UCL Psychology and Language Sciences. Resources in speech, hearing, and phonetics. July 2015.

Joe Wolfe. Music Acoustics. University of South Wales.

Nave. Vocal tract resonance.

Schnupp. K ng. Nelken. Auditory Neuroscience. Formants and Harmonics in Spoken Vowels. MIT Press.

Using Formants to Synthesize Vowel Sounds

What Are Formants?

For Example...

Exercise:

Examples:

MASTER MUSIC PRODUCTION