Acoustic phonetics is the study of the physical properties of speech, and aims to analyse sound wave signals that occur within speech through varying frequencies, amplitudes and durations.
One way we can analyse the acoustic properties of speech sounds is through looking at a waveform. Pressure changes can be plotted on a waveform, which highlights the air particles being compressed and rarefied, creating sound waves that spread outwards. A tuning fork being struck can provide an example of the pressure fluctuations in the air and how the air particles oscillate (move in one direction rhythmically) when we perceive sound.
Some acoustic analysis…
Frequency vs Amplitude
The frequency (pitch) and amplitude (‘loudness’ or intensity) of a sound can be analysed on a waveform. Frequency can be calculated through the number of cycles on a periodic waveform with a repeating pattern. The higher the number of cycles per second, the higher the frequency and perceived pitch. Frequency is usually expressed in Hertz (Hz).
Analysing the amplitude of a waveform tells us how intense or ‘loud’ a sound is, and how much the air particles deviate. It is conventionally expressed in decibels (dB).
The x-axis on a waveform corresponds to the time frame in which the sound was produced (usually in seconds or milliseconds), and the y-axis represents the amplitude.
Sine Waves vs Complex Waves
Sine waves are waveforms that have very simple, regular repeating patterns. The number of ‘cycles’ in the waveform (the number of complete repetitions in the period waveform) reflects the number of times the vocal folds have opened within the time frame displayed. This is known as the fundamental frequency (f0), which is measured in Hertz (Hz). A frequency of 200Hz means that there are 200 hundred complete cycles per second within the waveform, so 200 times the vocal folds have opened. In reality, most speech sound waves have a rather complex pattern, and are known as complex waves. These are made up of two or more simple sine waves, and the fundamental frequency can also be calculated on complex waveforms by counting the number of cycles per second on a waveform.
Periodic vs Aperiodic sound waves
Sine and complex waveforms are periodic, meaning their cycles are regular and repetitive. The types of speech sounds that would appear as a periodic sound wave are voiced sounds, such as vowels or nasals. Since such sounds have regularly repeating waveforms, they can also be decoded through ‘Fourier analysis’ which breaks down the component sine waves. This type of graph is called a spectrum, which does not measure time. Instead, the x-axis measures frequency, and the y-axis represents the sound pressure level.
The fundamental frequency on this type of graph can be worked out by selecting the lowest frequency component of this complex wave. This is usually the first complete peak on the spectrum. From this fundamental frequency peak, harmonics occur at evenly spaced integer multiples. Harmonics are known as the ‘natural resonances’ within the vocal tract, which are the amplified frequencies. On the spectrum, these correspond to each peak.
On the other hand, speech sounds can also be aperiodic when analysing them acoustically. This means that they do not have a regular repeating pattern, rather, they have a very random pattern meaning that a fundamental frequency cannot be calculated. This means that the aperiodic speech sounds are voiceless, such as a voiceless fricative.
Another way to analyse a sound acoustically is through looking at a spectrogram. They provide much more complex information than what we can see on a waveform. Similarly to waveforms, time is displayed on the x-axis, but the y-axis measures the frequency of the sound. Amplitude is represented by the darkness in the acoustic energy. The louder the sound, the darker it appears on a spectrogram and is therefore more intense. Spectrograms allow us to see the high frequency energy that comes with aperiodic sounds.
Transients vs Continuous sounds
Transients are a form of aperiodic sound. It is a sound that builds up pressure behind a closure, and then has a sudden burst/release which shows up as a spike on a waveform. On a spectrogram, the closure shows up as a blank space before a dark vertical band of acoustic energy to represent the release. A typical transient sound would be a plosive, such as [p] or [b] in the English sound system.
Continuous sounds are another form of aperiodicity. Unlike a voiced vowel sound, this would show up as an irregular, random pattern on the waveform. On the spectrogram, this would be represented by high frequency acoustic energy which is dark and intense, and therefore has high amplitude. This is typical of many voiceless fricatives, such as [f] and [s] in the English sound system.
Voicing on a spectrogram
As established earlier, voiced sound are periodic signals. A fundamental frequency can be calculated due to the regular openings of the vocal folds as they vibrate. On a waveform, this would be highlighted by a periodic sound wave. On a spectrogram, there are two specific visual elements to look out for, which resemble a voiced sound. The first is the vertical striations (they look like vertical wavy lines on the spectrogram), which correspond to this opening of the vocal folds, and when air flows through them every time. The other visual clue is the dark horizontal bands which are typical of vowels, approximants and nasals. These are called formants, which are the natural resonances of the vocal tract (earlier, they were described as harmonics). The size and shape of the vocal tract can be modified to allow these formants to vary. This can be done by changing the tongue position, lip position, etc.
Adapted from:Ogden, R. (2009) An introduction to English Phonetics, Oxford: Oxford University Press, p. 30-35.