What’s the International Phonetic Alphabet and What’s so Great About it?

Posted by Tyler Lau on 10/12/15 10:00 AM

IPA will help you with English, but also for learning any language!

Nope, it’s not the beer, though that’s pretty great too! The IPA stands for the International Phonetic Alphabet and is a standardized way to write down the sounds of any language. Sounds impossible, doesn’t it? But we’ll see how this system captures nuances of sounds in the world’s languages and why this is a great tool not only for understanding English, but also for learning any language.

You might have seen transcriptions before that almost look like English, but not quite, like the following for the word ‘casing’:


Many dictionaries use this kind of transcription, which uses terms that might be used in elementary school grammar, like “long a” (the ā sound there), but this system is not quite IPA. In fact, it’s better to shed your intuitions about what a “long a” or a “short o” are, because they won’t help you when you’re dealing with other languages!

So let’s get into the actual IPA! You can check out the full chart here


First let’s talk about consonants. Luckily, as a speaker and writer of English, you’ll be coming into the International Phonetic Alphabet knowing a lot of the symbols already! The following chart shows most of the consonants that are common in the world’s languages.

So what do these rows and columns actually mean?

Let’s start with the columns, which represent the place of articulation--or where the sound is actually being made in your mouth. What do these terms actually mean?! We don’t need to get into the nitty gritty of what each word means, but remember this--as you go from left to right, you’re moving from the front of your mouth to the back of your mouth. So let’s go in order from left to right.

A good source to follow along with while reading this is this website. If you click on a sound on that link, it will play what it should sound like to you. Click each sound as I describe it. So let’s start from the front of the mouth to the back!

Places of Articulation

First, check out the diagram below to see all the places of articulation we’ll be talking about and try to feel them in your mouth as your pronouncing the words below!

Bilabials are consonants made with the lips. Say pat and you’ll notice that your lips come together for the [p] sound. Now say mat and you’ll see that your lips are together for the [m] sound too.

Labiodentals are consonants made with your teeth on your lips. Say fat and vat and you’ll feel your teeth on your lips for the [f] and [v] sounds.

Dentals are consonants made with your teeth. Say that and notice that your tongue is between your teeth for what is written as [ð]. Now say tattoo and feel where your tongue touches your mouth. That hard part just behind the teeth is called the alveolar ridge, so sounds made there are called alveolars. Say lateral and you’ll see that [l] is made there too. Say shape and you’ll notice that the sh (transcribed as [ʃ]) postalveolar sound involves hitting your tongue a little further back than the alveolar ridge.

Next are retroflex consonants, which don’t exist in English, but involve rolling your tongue back as far as possible and hitting the roof of your mouth. If you try saying the t’s in tattoo by rolling your tongue back to that position, you may notice that it sounds like an Indian English accent. The reason is that Indian languages have both dental and retroflex consonants (so they have both [t] and [ʈ]) and American and British English (for example) alveolar consonants are perceived as retroflex.

The only palatal consonant that exists in English is [j], which represents the y in yes. If you pronounce yes, you’ll feel the middle of your tongue moving upward but not quite touching the mouth. Now try canyon, and you’ll feel that same part of the tongue moving up at the ny part (transcribed as [ɲ]) to touch what’s called the palate (or hard palate). 

Moving further back, say cat. The first consonant is velar and is transcribed as [k]. If you feel where the back of your tongue hits, that part is called the velum (or soft palate).

Uvular and pharyngeal sounds do not exist in English, but these sounds are produced at the very back of your mouth. The dangly thing at the back of your throat is called the uvula, and if you speak French, you’ll notice that the French r (transcribed as [ʀ]) as in rouge is produced by trilling your uvula against the back of your tongue. The pharynx is just above where your vocal cords lie and these sounds are common in Semitic languages like in the Arabic ayn (transcribed [ʕajn]).

Finally, we reach the glottis, which is where the vocal cords lie. Say hat and you’ll notice that air just passes through the vocal cords for the [h] sound. Another important sound to remember is the glottal stop, which is produced by briefly closing the glottis and releasing, as in the word uh-oh, transcribed as [ʔəʔoʊ].

Okay, now let’s move on to the rows of the chart. These are known as manners of articulation, and refer to how you are producing the sounds, rather than where. As you move from up to down, your mouth opens wider.

Rows on the Chart, or Manners of Articulation

Stops (or Plosives)

First are the plosives, also known as stops. These sounds are made by making a full closure at some point in your mouth and then releasing the closure. Say bat, dam, and gas, and you’ll notice that for the [b], [d], and [g], there is a full closure at your lips, alveolar ridge, and velum, respectively.

At this point, let me make an aside to talk about two extra features that can apply on top of most of the manners of articulation:

1. Voicelessness

Put your fingers on your throat where your vocal cords are and feel how it vibrates when you pronounce [b], [d], and [g] in the above words (bat, dam, and gas). These are known as voiced because the vocal cords vibrate. Now say pack, tap, and cast and you’ll notice your vocal cords do not vibrate when you pronounce the [p], [t], and [k] (the vocal cords start vibrating when you get to the vowel). These sounds are voiceless.

2. Aspiration

Another feature that is relevant particularly to stops is called aspiration. Say pulpit and put your hand up to your mouth and you’ll feel a puff of air at the initial aspirated [p]. Now say it again with your hand up to your mouth and feel the second unaspirated [p], which does not cause a puff of air. I’ve been transcribing all these p’s as [p] for simplicity, but to be more exact, we need to use a superscript h to represent aspiration, so pulpit would be transcribed as [phʊlpɪt]. If we were to pronounce the first p in pulpit as an unaspirated [p] it would sound very funny, and the same if we were to pronounce the second p as an aspirated [ph]. However, in Spanish, all p’s are unaspirated, so Spanish speakers would pronounce pulpit as [pʊlpɪt], while Georgian speakers pronounce p’s as aspirated, so they would say [pʰʊlpʰɪt]. Sounds like these make these accents distinctive to us when these speakers speak English.

If you look at the International Phonetic Alphabet chart of consonants, you’ll see that many of the cells have more than one consonant in them. The left counterpart is voiceless while the right one is voiced, and depending on the language, there may even be a three-way distinction, where you have aspiration as well, and you’ll see three counterparts:

p   ph   b

All three of these sounds are stops and are all bilabial so they would occupy the same cell on the chart and it’s good to remember these sounds as a group when you’re learning a language since they make minimal distinctions with each other. In Thai, for example, [pa] is “aunt”, [pʰa] is “cloth”, and [ba] is “crazy”, so these distinctions matter! Now back to the main manners:


Next are the nasals, which are made with a full closure in your mouth, but with a release of air through your nose. Say “mmm” and put your finger horizontally in front of your nostrils and you’ll feel some air coming out. The [n] in new and [ŋ] in sing are also nasal. Note that [ŋ] is one sound in the IPA even though it is written as two letters in English (if you try pronouncing IPA [n] and [g] next to each other it will sound very different, as if you’re saying something like ‘sinig’). Also note that English doesn’t allow [ŋ] at the beginning of a word, but that many languages, like Cantonese, do: tooth in Cantonese, for example, is [ŋa], and see, we’re transcribing other languages already!


Moving down, we see the trills. English doesn’t have trills, but many languages, like Spanish, have what is often called in colloquial speech a “rolled r”. This is an alveolar trill, because it involves the tip of the tongue flapping back and forth continuously against the alveolar ridge. If you can roll your r, try saying [ra] (or better, if you speak Spanish, say ropa, which means ‘clothing’) and you’ll feel the rapid flapping. As I mentioned in the uvular section, French [ʀ] is a trill that involves the uvula flapping rapidly back and forth.

Taps (or Flaps)

Next up are taps, which are like trills but involve a single flap. English has taps in the middle of words. Try saying latter and you’ll notice that the tt part isn’t actually a [t] sound, nor is it a [d] sound (the difference between that part and the d in dam, for example, might sound subtle to you, but would be noticeable to speakers of other languages). Rather, this is the alveolar tap [ɾ] as it involves tapping your tongue quickly against your alveolar ridge.


Now the fricatives. Fricatives involve creating a narrow passage in your mouth and so they are slightly more open than the above sounds. Try holding [s] for as long as possible (as if you’re hissing like a snake) and you’ll see that it lasts as long as you can hold your breath. Now try [b] and you’ll notice that you can only hold it for about a second. Because [b] is a stop, you can only hold it until the air held in your mouth runs out, but since [s] is a fricative, the narrow passage allows air to continuously flow out until you run out of breath. Fricatives are also a good way to practice the difference between voiceless and voiced stops. Hold [s] again and then change to [z] in the middle and back to [s] again and you’ll really feel the difference in your vocal cords vibrating. Fricatives can also be in different positions in the mouth, so you can try with holding [f] and changing to [v] and back for example, where you’ll notice the air is flowing between your teeth and lips.


We’ll skip the lateral fricatives because they are relatively rare in the world’s languages, and talk instead about approximants. Approximants involve making the opening in your mouth a little wider than a fricative (and if you made it any wider, you would be making a vowel). Say red and you’ll notice that the r (transcribed as [ɹ]) is made at around the same position as [ʃ] (like shed) but with a bigger opening). The other English approximants are [w] and [j]. [w] is made by making a slight closure at your lips and this is more subtle, but if you feel carefully, you’ll notice the back of your tongue also moves up towards your velum, so this is called a labiovelar approximant (or also a labiovelar glide). [j] involves moving your tongue up towards your palate, so it is called a palatal approximant (or also a palatal glide).


Finally, the term lateral refers to the fact that these consonants involve air moving out either side of your tongue. Hold and [l] and put your hand in front of your mouth and you’ll feel some air coming out both sides.

Great, we’ve gotten through the consonant categories that exist in English! Another useful tool is to look at mouth diagrams, to see where a certain sound is made. Some of these can be found at this website: http://idrani.perastar.com/ISMS_phonics.htm . The left chart below shows a [t] sound, made by putting the tip of the tongue at the alveolar ridge, while the right chart shows a [k] sound, made by bringing the back of the tongue up to the velum. These charts are especially useful when you’re learning a sound in a language you’re unfamiliar with (for example, there are many tongue diagrams that can be found for the alveopalatal fricatives of Mandarin).

If you look at the full IPA chart, you’ll see that there are many more symbols involving other airstream mechanisms to transcribe sounds not in English (such as [!] for the alveolar clicks of Sub-Saharan African languages or [ɓ] for the implosives of Southeast Asia or [p’] for the ejectives of languages in the Caucasus mountains), which you can look into if you choose to learn these languages!


Okay, let’s talk about the other big class of sounds, vowels, which involve full openings in your mouth. The IPA vowel chart is as follows:

Characterizing a Vowel

How do we make sense of the shape of the vowel chart? Try saying eat (transcribed as [it]) and notice that the [i] vowel is made at the front of your mouth and that it is also at the top of your mouth. Now try saying oops (transcribed as [ups]) and notice that the [u] vowel is also made at the top of your mouth but is in the back. Now say follow (transcribed as [faloʊ] or [fɑloʊ] depending on your dialect). Either way, this vowel is low in your mouth. Finally, say cut [kʌt] and court [kɔɹt]. Notice that your lips don’t round in the first vowel, but they do in the second one.

So we can describe vowels in three ways:

  1. Height: Vowels can range from high (or close) to low (or open), with mid vowels in between, which can be further subdivided into high-mid (or close-mid) and low-mid (or open-mid) vowels. The height roughly correlates with how high in your mouth it feels.
  2. Frontness (or Backness): Vowels can range from front to central to back. The backness roughly correlates with how front or back in your mouth it feels.
  3. Roundness: Vowels can be unrounded (the left counterpart of each slot in the IPA) or rounded (the right counterpart of each slot in the IPA). Roundness roughly correlates with whether your lips are rounded or not.

The Vowel Space

Because I’m describing the positions of the vowels as “feelings” of where they are in your mouth, it might sound scientific or vague, but vowels have features known as formants can actually be quantitatively measured (in Hertz). The first formant (F1) corresponds to the height (low F1 = high, high F1 = low) and the second (F2) to the backness (low F2 = back, high F2 = front), while both the second and third formant (F3) correspond to the roundness. If we chart the F1 and F2 values of different vowels on a 2D plane where the origin (0, 0) is at the upper right corner and F1 increases as you move left and F2 increases as you move down (as in the following diagram), you will get a vowel space that looks like the IPA vowel chart above.

Mapping the vowel space this way is helpful, because just as languages like to keep consonants as symmetric as possible, they also like to keep vowels symmetric and as dispersed as possible so that they are easily perceived to be different from each other. For this reason, languages often have many vowels at the edges of the space and very few in the center. 5 vowel languages (such as Spanish, Japanese, and Hawai’ian) are the most common systems with [i], [e], [a], [o], and [u] (notice how these form an almost-regular pentagon on the vowel space) and 3 vowel systems are also common (such as Arabic and Inuktitut), with [i], [a], and [u] (notice how these form an almost equilateral triangle).

English Vowels and Stress

English happens to have one of the richest vowel inventories in the world, with most dialects having over 20 vowels (not just the five that we learn in grammar school, the spellings of which are a relic of the fact that Latin did have five vowels)! We can see just a few examples of the richness of the English vowel system by inserting different vowels between [b] and [d] and moving from the upper left of the vowel chart down along the edge until we get to the upper right:

Another important sound that shows up a lot in English is called the schwa and looks like this:


The schwa is the central vowel (look at the chart above again) and exists in a lot of unstressed environments. When I say “stressed”, I mean that the syllable is generally louder, higher in pitch, and or longer. Say “peruse” for example and you’ll notice the stress is on the second syllable, but now say “perish” and the stress is on the first syllable. Notice that the first vowel sounds different too. In IPA (the straight apostrophe ʽ is used before the stressed syllable), these are:

Both words begin with the syllable written as “per”, but the first one becomes a schwa because it is unstressed. If you transcribe words in English, you’ll notice it’s everywhere!

To hear even more vowels of English, check this out!

The best way to learn transcription is to listen to the sounds on the IPA chart in the sound files on the website given at the beginning of this post and to check them with words in languages you know. With enough practice you’ll be transcribing in no time and be able to practice on foreign languages that you’re learning! Knowing the exact places of articulation, manners of articulation, and position on the vowel chart of consonants and vowels will go a long way in helping you master the phonetics of a foreign language and fool a native speaker!

 For more blog posts on language learning from our language tutors in Boston and NYC, check out these posts: How to Learn a Language From Your Living RoomWhy Should I Study Latin, and Tips and Tricks for the French SAT Subject Test. Looking to work with Tyler Lau? Feel free to get in touch! Cambridge Coaching offers private in-person tutoring in New York City and Boston, and online tutoring around the world.

Sign up for a free Study Skills Consultation!

Tags: language learning