To the casual listener, vocoder and Auto-Tune effects sound similar, and the terms are often interchanged when describing the iconic “robotic voice” sounds that have shaped pop music over the decades. But while both effects modulate your voice, the technologies behind them are completely different.

We sat down with Antares DSP engineer Andrew Kimpel to learn about the history of these two iconic effects, how they work, and what they sound like. Andrew also shares some details about his work developing authentic-sounding emulations of classic analog vocoders that we’ll see in upcoming Antares products.

In basic terms, what is the difference between a vocoder and Auto-Tune?

Auto-Tune and vocoders are completely different animals, although both can be used creatively to impart an artificial, synthetic timbre to a singer’s voice. Auto-Tune was originally designed to correct pitch, letting singers sound more “in tune” throughout their performance, in a manner that is transparent and natural. But it was discovered that with certain extreme settings (notably, Retune Speed set to zero), Auto-Tune could be used to instantaneously correct pitch from note to note, in a way that sounds very unnatural.

A vocoder, by comparison is something altogether different. The vocoder was developed during World War II as a way to encode and encrypt speech. But by the 1960s, musicians found ways to use it to create the familiar robot-voice effect.

A vocoder requires two inputs: your voice and a “carrier,” typically a synthesizer waveform. The vocoder extracts the “shape” of your voice (the formants) from the microphone input using a bank of bandpass filters and applies it onto the carrier channel (“cross-synthesis”).

What about the Talk Box?

Popularized by Peter Frampton, the original Heil Talk Box contained a speaker (a small horn compression driver) coupled to a long tube that was placed into the singer’s mouth. A guitar plugged into the Talk Box would have its signal amplified and output via the compression driver, sending sound waves down the tube. The singer spoke or sang into the tube, using his mouth to modify the sound of the guitar (captured via his vocal mic), creating a “talking guitar” effect. So in essence, the Talk Box is a type of vocoder that requires two inputs: guitar and voice. It has its own unique sound, compared to a vocoder, since it uses an electro-acoustic system to create its signature effect. The vocoder achieves its effect through purely electronic means.

People often describe the sound of the Auto-Tune effect as a kind of “robotic” voice, but the same term can also describe the sound of vocoders. Can you elaborate on the sonic differences between Auto-Tune and vocoders, and share your favorite songs that exemplify each?

If you know what to listen to, it is relatively easy to tell the difference. When you listen to a vocal processed by Auto-Tune with Retune Speed set to zero, the robotic quality you are hearing is the instantaneous change in pitch between notes. A secondary effect is that all of the natural variation in sustained sung notes is removed, which also sounds very unnatural (too perfect to be human).

The vocoder, by contrast, is essentially replacing the sound-generating part of your vocal tract (the vocal cords) with an electronic oscillator (synthesizer), and your throat and mouth with a bank of bandpass filters to apply the vocal formants. The result is a completely synthetic, robotic sound with all traces of the original vocal timbre completely removed.

My favorite vocoder track is probably “Let’s Groove” by Earth, Wind & Fire. The vocoded intro sets the tone for the entire song. A close second would be “E=mc2” by J Dilla and Common. My favorite Auto-Tune track is “WoW” by Post Malone.

What are some of your favorite vocoders?

My two favorite vintage hardware vocoders are the Sennheiser VSM-201 and the EMS 5000. Both units were behemoths, with very sophisticated controls, a large number of vocoder bands, very musical filter banks, and unique features that weren’t replicated in later, lower-cost units. The Sennheiser VSM-201 was used by Kraftwerk and Herbie Hancock; the EMS 5000 was used by Stevie Wonder.

Why did you decide to develop software emulations of vintage vocoders for upcoming Antares products, and what was your general process?

To understand the underlying theory behind vocoders, I started out by studying the schematics and manuals for several hardware vocoders that were popular in the late 1970s and early 1980s. It made sense to me that I should try to develop an architecture for a vocoder that would attempt to model the signal flow as well as incorporate many of the key design concepts from these vintage units. The most important area I looked at was the filter banks. The filter bank design is critical in terms of how a hardware vocoder imparts its signature “sound” onto your voice. The basic premise was that if I could reverse-engineer the filter bank designs from schematics and model those filters digitally, then the software vocoder would have a good shot at approximating (emulating) the sound of the original vintage hardware vocoder.

Can you share any details about the features we’ll see in Antares’ future vocoder products?

Only that the architecture and feature set will pay homage to the classic hardware vocoders of the 1970s and 1980s, modeling not only their characteristic sounds but incorporating many of the unique routing capabilities of those signature instruments. Being able to easily recreate these vintage sounds is a design goal, while adding new capabilities such as fully customizable filter banks and formant effects that weren’t available back then. Ease of use is also a big design consideration. Lastly, it seems logical that we would want to include Auto-Tune pitch tracking and pitch correction into the design, making it into a fully Auto-Tuning vocoder!

Sarah Jones

Sarah Jones

Music and Technology Writer

Sarah Jones is a writer, musician, and content producer who chronicles the creative and technical forces that drive the music industry. She's served as editor-in-chief of Mix, EQ, and Electronic Musician magazines and is currently the live sound editor of Live Design magazine. She’s a longtime board member in the San Francisco chapter of The Recording Academy, where she develops event programming that cultivates the careers of Bay Area music makers.