On Musical Selection in Research: 1. Baking a Cake

A five-part deep dive into music, the brain, and neurological studies of their interaction.

Did you hear about the concert pianist who ate her own sheet music? She said it was a piece of cake.

TL;DR – By breaking down the main elements of human-organized sound, we gain a better understanding of the neurological processes involved in each, which will better inform and predict the selection of musical styles and recordings in research environments. We’ll examine the difficulty of establishing the control condition in auditory experiments, the myth of “relaxing” music, how audition relates to attention and distraction, and top-down vs. bottom-up processing of the musical elements. We’ll also touch briefly on the relation between music and language processing. Finally, we will give several examples of a brief paragraph that more accurately describes music selection using the following seven criteria: rhythm, melody, harmony, timbre, form, loudness, and context.

The Recipe

I was at a dance club in Santa Monica with my friend Lauren, who is not a trained musician. She had just sat in on an music production session, which had left a sour taste in her mouth.

“Why didn’t you like it?” I asked, probably between sips of Guinness because this was ten years ago.

“They were making this track that sounded great,” she answered, “but then they got stuck because they couldn’t ‘find the hook’.”

“And that bothered you?”

“Yeah,” she said, frowning. “I don’t like thinking about music like that. It’s so formulaic, you know?”

A part of me will always agree with this sentiment, but the producers she was with weren’t in the wrong, either. Making music is an amazing feeling when the inspiration juice is flowing, but you also need to know the formula when the creative intuition wears out. Lucky for Lauren and me, our mutual friend DJ Maggie had just started as the opening act, and she had selected an ambient set to get the venue warmed up.

“I do know what you mean,” I said to Lauren. “But, well, take this set Maggie’s playing. It’s atmospheric and relaxing, right? Don’t you think it makes for a great opening set?”

Lauren warmly agreed.

“This music doesn’t have a hook, either,” I said, exactly as clearly and succinctly as I’ve recalled it now. “No one thinks music always needs a hook. But a hook – something repetitive and catchy – serves a purpose for a song with a certain goal. Without a hook, it’s a different kind of song.”

Lauren frowned again. “But there’s nothing wrong with that.”

“No, it’s not wrong or right, it’s just… It’s like baking a cake,” I said, coming up with a metaphor I’ve used many times since. “The standard cake recipe is butter, sugar, eggs, and flour, right? But you can bake a cake out of all sorts of things that aren’t those four ingredients. It’s just that the more you subtract and substitute from those four things, the less the final product looks and tastes like a cake.”

I call it, “A Waltz in Pound Cake.”

I love this metaphor because, just like cake, there are four standard ingredients of music: rhythm, pitch, harmony, and timbre. There are a few more subjectively experienced ones as well, such as loudness, context, and structure. Young composers and songwriters often struggle the most with structure, which is probably why form tends to be so rigid in a given era. The contemporary era has seen an interesting twist on this tradition, because pop and folk music are the lucrative styles, which has caused resistance to rigid structures in many trained chamber composers. Even for untrained musicians like Lauren, stripping music down to its elements feels wrong, because it takes away the magic. They believe (or worry) that it will subtract from the experience of music.

Here’s a Richard Feynman quote about that.

“I have a friend who’s an artist and has sometimes taken a view which I don’t agree with very well. You hold up a flower and say, “Look how beautiful it is!” And I’ll agree. And he says, “You see, I, as an artist, can see how beautiful this is, but you, as a scientist, take this all apart and it becomes this dull thing.” And I think he’s kind of nutty… Science knowledge only adds to the excitement, the mystery, and the awe of a flower. It only adds! I don’t understand how it subtracts.”

Feynman, 1981 interview for his book, The Pleasure of Finding Things Out.

Do bakers enjoy cake less the more they learn? Based on the greatest show of all time, The Great British Bake-Off, I’m going to say, absolutely. Do comedians still laugh at jokes? Yes, they just laugh at less of them. Learning the formula changes one’s tastes by refining critique and making surprise more difficult to achieve, but in the end it only enriches enjoyment. Case in point, Lauren gleefully exclaimed that she got it, that the cake explanation of musical formula helped a lot, and then we danced our butts off because that’s what friends do over Guinness at dance clubs in Santa Monica.


  1. Baking A Cake – In this introductory post, I’ll go over definitions of terms for each element of music.
  2. The Control – In part two, I’ll discuss issues with researching music and science, with a focus on how hard it it to set up reliable control conditions.
  3. Bottoms Up – Part three will focus on the “bottom-up” perception of rhythm and low frequencies in the brain.
  4. With the Top Down – Part four will discuss the “top-down” process of experiencing pitch, melodic, and harmonic audition.
  5. Conclusions drawn from this journey into music and the brain.

Defining the building blocks of human-organized sound has been done many times before, but I hope to distinguish myself by taking a neurological, evidence-based approach to the subject. With modern advances in EEG, PET, and fMRI scanning, scientists are making great strides in mapping what regions of the brain are activated when perceiving and performing musical tasks. We’ll talk about music in both chamber and folk/pop traditions. Some styles discussed will include minimalist, ambient, noise, drone, algorithmic, electronic beat, and postminimalist music. Maybe also a bit of the ol’ Baroque, Classic, Romantic epochs too, though probably not as often since that’s been worked to death already.

An inexhaustive list of words often used to define the elements of music:

  1. Pulse, feel, meter, rhythm
  2. Frequency, tone, pitch, melody
  3. Chord, scale, key, harmony
  4. Timbre, texture, color
  5. Miscellany – Context, Loudness, Structure

Because you have to know the rules before you break them! Actually, you can break all sorts of rules without knowing them. It’s the easiest way to break a rule, to be honest.

The purpose of this is mostly to piss off some really close grad school friends who loathe music analysis, for my own sake as I consider the possibility of doctoral studies, and also to determine possible paths toward more reliable and reproducible results in neurological music therapy applications.

As a final note, I’ll be using Eurotraditional to refer to the standard classical and chamber music repertoire from Europe, as opposed to Western, with an emphatic shoutout to the brilliant Natalie Wynn’s for her insightful, entertaining, mildly NSFWish ContraPoints video presentation regarding the term.

1. Pulse, Feel, Meter, Rhythm

“It’s interesting. I’ve known quite a few good athletes that can’t begin to play a beat on the drum set. Most team sport is about the smooth fluidity of hand-eye coordination and physical grace, where drumming is much more about splitting all those things up.”

Neil Peart, author and former drummer of Rush

In musical parlance, beat has many definitions. If you ask, “What’s the beat of this song?” you’re asking about the tempo, a.k.a. Beats Per Minute (BPM). If you say, “I love this beat,” you’re usually referring to a rhythm of around 2 to 4 bars long which generally includes 8-16 eighth, quarter, or (rarely) half notes. If you say, “I love Beat It,” you’re referring to a popular Michael Jackson song. See? It’s complicated.

Tempo and pulse are roughly the same concept. For example, the song “Beat It” has a pretty fast pulse at 138 BPM.

Feel is the basic of unit of rhythm, comprised entirely of two simple numbers: 2 and 3, a.k.a. duple and triple meters. “Beat It” is in duple, because the strong-weak cycle lasts two eighth notes instead of three. Making matters worse, and by that I mean way better, you can play both at the same time. Every hip hop song on the planet seems to do this right now, using vocal triplet structures to add rhythmic syncopation over the classic 4/4 meter. OutKast uses triple meters in their beats fairly often, in the decades-long tradition of Southern Swing, which is in 6/8 or 12/8 meter.

And it sounds GREAT.

Meter is the hierarchical grouping of pulses based on feel, such as the most common one in Eurotraditional music, the trusty, endlessly symmetric duple 4/4. That’s four quarter note downbeats to a bar, also called a measure. This definition also covers Indonesian last-beats, but doesn’t quite cover West African bell patterns. Patterns itself best describes that music, which has a pulse but little or no true meter.

A bar or measure can best be defined as full units of beat patterns, for example a couple bars that go something like, kick SNARE kick CLAP kick SNARE hat hat kickCLAP. That was “Beat It” in case that wasn’t clear.

Finally, I will define the umbrella term rhythm with the help of CalArts composer, conductor, and music theory professor Marc Lowenstein:

“Rhythm is best described as something like ‘the relative times between changes in sound.’ It might sound dorky, but the precision is important. It is not pulse dependent and is not hierarchically important. All three of those terms do inform each other, but they are distinct.”

Marc Lowenstein

In other words, one of the ways humans organize sound is via time, and rhythm is the perception of those changes relative to each other. Rhythmicity is one of the most intuitive ways humans distinguish music from not-music. There are no records whatsoever of folk music traditions that lack a pulse arranged by regular strong and weak beats. Even modern attempts at pulseless music often have an implied one somewhere, though whether it’s perceived as one depends on many factors.

So it comes as no surprise that playing, listening to, and especially dancing to a drum beat activates more areas of the brain than any of the other three major music ingredients. In a rock recording studio, the drums get more attention and microphones than any other instrument, and roughly half of the available decibels in a standard mix go toward the drums, while the rest of the band gets shoehorned into the remaining half.

Our brains don’t just like rhythm. They are rhythmic. Brainwaves are real, they can be trained, and are constantly manipulated by all sorts of practices from the new age to the mundane. Most of our motor functions happen in time to rhythmic brain pulses, an incredible phenomenon called motor resonance. An adult human walks on average at a pace of about 2 hertz, or two steps per second. If you divide 2 Hz into four beats, you get a pulse of 120 beats per minute, which is such a common musical tempo that most audio editing software has it set as default (Ableton Live, for example). It’s also the most common march tempo. You’ll find some ratio of 2 hertz in impromptu finger tapping and a long list of routine motor actions (Leman 2016, shout out to JS Matthis 2013). Neurologically, it’s the default tempo because it’s roughly in the middle of the spectrum that humans discern as rhythmic vs. arrhythmic, i.e. too fast or slow to perceive as a beat. Too slow and we start to subdivide it, too fast and it becomes a tone.

Music without a pulse with a duple or triple feel arranged in a predictable pattern probably didn’t exist until modern times. It’s still pretty hard to pull off for the sole reason that humans aggressively impose patterns on literally everything. It’s kind of our thing. But contemporary composers now attempt works that remove rhythm entirely or make it unpredictable as to be unperceivable, with varying levels of success.

One reason it’s so hard to do away with rhythm is human locomotion. When the average person’s brain hears a strong backbeat, such as in “Beat It,” the motion control strongly activates with electrical impulses from the cochlear system. I mean this literally – neurons in the brainstem and motor cortex fire in time to the beat (Thaut, 2006, 2014) when listening to steadily pulsing rhythms. This is part of the bottom-up model of audition, meaning we experience it as sensory instead of as a higher-tier cognitive process. In short, the brain dances to its own innate rhythm, and synchronizing our bodies and external stimuli with it is a very pleasant experience.

The human brain, constantly.

A few examples of music that might ignore or de-emphasize rhythmicity are drone, ambient, noise, and true minimalist music. You’ll also find music that has rhythmic elements but no meter, essentially changing at a pace that prevents the perception of an established pattern or pulse. That will often involve electronics, especially algorithmic music or modular synthesis, though some improvisational styles attempt to achieve this as well. As previously stated, West African drumming often has multilayered patterns, but no true meter – only a feel in duple or triple time. There are endless examples of solo music in which the performer employs wildly variable tempi to manipulate the tension-release cycle. This is often called expressive or subjective timing, examples of which are found in the following performances:

Later, we’ll go more in depth regarding exceptions and substitutions, but first, we’ll need to define the remainder of our terms. In other words, we need to learn how to bake a standard cake before we venture into vegan gluten-free low-carb quantum cake territory.

2. Frequency, Tone, Pitch, Melody

“Music creates order out of chaos: rhythm imposes unanimity upon the divergent, melody imposes continuity upon the disjointed, and harmony imposes compatibility upon the incongruous.”

Yehudi Menuhin, violinist, Symboles Dans la Vie Et Dans L’art, Leit, Whalley, 1987

In the beginning, there was a sine wave, and it looked like this:

*Waves* Hiiii!

All you need to know is this: When the peak in the red line points upward, the particles in a medium are dense, all smushed together. When the bump points down and forms a trough, the particles are sparse and spread apart. Do this enough times, and you’ve built a universe.

The sine wave is simple, beautiful, perfect math. Rotate it and you can draw circles or spirals. It looks cool in 3D, too. Pass it through the electromagnetic field, and you get pretty colors and X-rays and stuff, also known as light. Stack them on top of each other and you get neat shapes like triangles, sawtooths, and squares. Because math!

Maybe the coolest GIF about Fourier Transforms ever.

And if you pass some sine waves through air or wood or brass or taut animal skins or whatever, you get sounds.

Just like human eyes can’t see X-rays or radio waves, our ears can’t detect every frequency. Which ones you can hear (if any) is up to genetics, hearing damage, and age, since as you get older you hear less high-pitched tones. But, generally, you start out hearing around 20 Hz to 20,000 Hz, which completely coincidentally is wavelengths of about 17 meters to 17 millimeters long. All of which leads us to the question, “If a tree falls in the woods, and it only emits frequencies below 15 Hz, does it make a sound?”

The trees have answered.

To put it simply, sine waves are everything. Detecting them is how we perceive the world around us, but also how we respond to said stimuli. This exchange is at the heart of musical interaction. We essentially turned the framework of our existence into a really cute game by generating tones in rapid succession at mathematically related frequencies. Although, usually, we just call it humming a melody.

We don’t know what came first, singing or talking (or maybe whistling?), but we know both went hand-in-hand during hominid evolution. The first instruments were probably drums or xylophones lost to time, but the earliest evidence of artistic expression are flutes made from bones and horns. Flutes, incidentally, make the closest approximation to a clean sine wave out of all symphonic instruments. Our discovery of such instruments predates any known early visual art by thousands of years (Wallin et al 2001). We made simple instruments that produced soundwaves with simple math, and can assume singing was a precursor to this, making it developmentally along the same timeline as language development.

Below you’ll find a picture of the Overtone Series (also called the Harmonic Series). You can imagine this picture as a series of combined mathematic concepts, but you can also imagine it as a series of pictures of vibrating guitar strings, violin strings, etc.

The Harmonic Overtone Series

The top one labeled 0-1 is the fundamental tone, i.e. the lowest pitched tone, i.e. a representation of a perfectly vibrating string without any nodes (like when you hold down a string against a guitar fret). The successive ratios refer to how the sound wave is allowed to vibrate, in other words, more math. If you’d like to know a little more about the basics of this, I cannot recommend the following 10-minute video enough.

The harmonic series does a lot more than help us arrive at pitch, melody, and scale, because it’s also a model for how we perceive harmonies and timbres. To oversimplify it a bit, the more complex the ratio of a partial from the series is, the more dissonant we perceive a tone in a scale or chord.

The simplest partials, i.e. the first few in the series, make up the pentatonic, or five-note, scale, which is practically second nature to human musical perception. If an untrained musician is asked to improvise a melody, it will usually be pentatonic if it’s in tune. Almost all humans internalize pentatonic melodies quite easily, which is probably due to the fact it occurs naturally all around us. We’ll get more into that later.

We’re coded, at least as infants, to prefer higher-register melodies (Trainor 1998). Voices perceived as another gender than the listener garners more attention (Junger 2013). There’s not a lot of existing research about voice and register with regards to age and gender distinctions, unfortunately. But what we have shows that, to be judged a melody, it must generally exist in the human vocal range, making it likely that melodies help humanize a passage of music and thus tying together all the other elements. Melody is sort of the cognitive glue of a traditionally arranged song – the hook, if you will, that makes it memorable and personal.

Singers, especially untrained ones, often inadvertently damage their voices over time (or sometimes on purpose) just to achieve a signature “sound.” This is vocal timbre is important to us on many levels of social interaction, and we associate all sorts of interpersonal assumptions with vocal timbre. Song creators capitalize on melodic attention by writing the catchiest melody they can think of and playing it repeatedly throughout a song. Quite often, getting a woman to sing said melody in a sexually enticing way evokes a more memorable reaction in the listener. These methods of neural manipulation are quite lucrative, which is part of the formula against which my friend Lauren rebelled. Sexualization in popular music is an ongoing saga fraught with both empowerment and exploitation, and many books exist on the subject. Suffice to say, I’ll just link to Women and Popular Music by Sheila Whiteley and leave it to the reader to pursue further if desired.

Ravel’s Boléro notwithstanding.

Likely because of its ubiquity, examples of music styles that attempt to avoid folky melodies are vast – noise music, ambient drones or chords, and minimalism, for example. Serialist composition technique attempts to subvert the composer’s tendency toward melody by using rigid rules that produce difficult-to-sing tone rows, to the great frustration and/or excitement of contemporary voice students everywhere. Although the reality is that most students can sing by their sophomore year what seasoned professionals used to call impossible. Which is a kind of human progress that could be called something like expertise normalization, but apparently it isn’t. If you know the technical term for this, please let me know, because as a concept it’s apparently very hard to Google search and should be a way bigger thing.

Anyway, to sum up, in spite of all our best efforts, humans are annoyingly adept at singing along to pretty much anything.

3. Chord, Scale, Key, Harmony

“How sweet the moonlight sleeps upon this bank!
Here we will sit, and let the sounds of music
Creep in our ears; soft stillness, and the night
Become the touches of sweet harmony.”

The Merchant of Venice

Harmony is, objectively, the hardest element to define out of these four ingredients, because… Well, here’s a pretty good attempt from William Malm’s book Music Cultures of the Pacific, the Near East, and Asia:

“Harmony considers the process by which the composition of individual sounds, or superpositions of sounds, is analyzed by hearing. Usually, this means simultaneously occurring frequencies, pitches (tones, notes), or chords.”

Almost anyone can recall a melody or make one up on the spot. If you ask someone to recreate their favorite multiple-bar drum solo, many will fail, but some, at least, will succeed. However, asking someone to recite their favorite harmonic resolution means they’ll actually need to sing a melody implying a harmony (unless you’re Anna-Maria Hefele, but that’s another story).

Harmony can take on an almost mystical quality. We can help standardize the definition with a distinction: harmony is a perceived simultaneous combination of pitches, while an implied harmony is the type that arises from an unaccompanied melody or similar. Together, these terms summarize our perception or understanding of an arrangement of harmonic consonance and dissonance, i.e. the tension and release cycle of notes and chords.

Harmonics can refer to a variety of phenomena, including suppressing the fundamental note to get a higher-pitched tone from an instrument, or it can more generally refer to multiple waveforms/oscillations/whatever occurring simultaneously in a scientific context. The musical meaning predates the scientific one, so there.

When defined as a basic musical element, harmony isn’t specifically the bass line or top notes, and it’s not the chords played on keys by the left hand or strummed on an acoustic guitar. That’s the accompaniment implying a harmonic progression. If I sing an unaccompanied melody, no one’s playing any chords, but the tune can still be analyzed harmonically. An organ fugue rarely contains much block chord voicing, but it will still move through harmonic regions that arise from accidentals, interlocking melodies, and pedal tones. So that’s why chords are not the same thing as harmony.

A key is a mode or scale with one pitch made hierarchically the most-stable by an asymmetrical, contextual process. In Eurotraditional music, the key is a shorthand indicating what notes, accidentals, and chords to play. The “most-stable” pitch often refers to the tonic, such as the note “C” in a C major scale. Some alternate forms of organizing tonal content are ajnas, maqams, and dastgah from the Middle East, thaat in Indian raga, pathet from Indonesian gamelan, and so on. Scale, key, and mode are often used interchangeably in English, but this can get murky when you consider international interpretations of musical pitch organization or performance environments. Here are iterative definitions for each:

  • Mode – a limited collection of discrete pitches.
  • Scale – an ordered mode.
  • Key – a mode or scale with a hierarchically most-stable pitch.

Sew, a needle pulling thread. Contemporary compositions often lack a traditional harmonic structure, but the cycle of tension and release is so inherent to humans that it’s quite difficult to avoid entirely. For example, many works don’t use easily analyzed chords, but tone clusters – like those created when mashing your hands on a piano – still contain harmonic information, even if it’s just that one hand’s cluster contains higher-pitched notes than the other.

Note that the high/low description of pitch is learned, not inherent. Examples of pitch characterizations from other cultures are thin/thick and light/heavy. Though we can all intuit the other spectrums, musicans who grew up with different characterizations have no predisposed association in the practical sense. I will be using the high/low spectrum of pitch in this series, because why not.

Composers have spent a lot of time in the last century creating complexly organized frameworks that defy classic tonal analysis. It’s almost more common in academic music to reject key signatures, a practice referred to with the umbrella term atonality. But even those pieces often contain moments upon which the listener might impose a harmonically interpreted tension-release cycle. Harmony – implied or not – just happens, whether you like it or not.

4. Timbre, texture, color

“I sometimes wish taste wasn’t ever an issue, and the sounds of instruments or synths could be judged solely on their colour and timbre. Judged by what it did to your ears, rather than what its historical use reminds you of.”

Jonny Greenwood, composer, guitarist for Radiohead

Firstly, it rhymes with amber. Secondly, the story of music performance is timbre. Thirdly, the story of recorded music is still timbre, just more of it.

Here were the available musical timbres for a few million years:

  • Blowing through a wooden tube
  • Blowing through a wooden tube over a wooden reed
  • Blowing through a metal tube
  • Sticking giant metal tubes in a wall and blowing through those
  • Blowing through a meat tube over a meat reed, also called singing
  • Plucking taut strings over some gourd-shaped cavity
  • Sawing taut strings over more taut strings over a gourd-shaped cavity
  • Making taut animal skins resonate over some gourd-shaped cavity
  • Hitting animal skins with sticks
  • Hitting sticks with other, possibly different sticks
  • And, of course, this thing:
I mean, there are always exceptions.

“Instruments possess different timbres” is a fancy way of saying a trumpet doesn’t sound like a kettle drum. Different resonator shape and material means different partials in the overtone series get emphasized. That’s largely where the signature sound of a class of musical instruments comes from.

Timbre isn’t the difference between high and low notes. That’s pitch, i.e. tonality. Timbre is the quality of the sound. If I sing “ah” with my mouth wide open and a lot of breath support, that’s a very round, smooth timbre. If I sing through my nose and do my best Munchkins of Oz impression, that’s a sharp, nasal timbre. The “ah” color is going to look more like a sine wave, while the sharper timbre is literally a sharper shape, as shown the comparison of three instruments’ soundwaves linked above. Er, and also linked just now. You can see more about vowel timbres, their descriptions, and what their waveforms looks like here.

This is a good place to mention that, in the neural realm of sound, the human vocal timbre is quite special. We have specific regions and processes dedicated to identifying the source of a speaker or singer, including their gender, age, emotional state, status, personality, arousal, and attractiveness. [1][2][3][4][5][6][7]. This builds directly upon the discussion about melody and pitch perception. In other words, the timbre of the singer’s voice can greatly influence how the pitch content is received, providing context via color and personality. Not only that, vocal qualities inform a lot about how we experience and describe sound, for example “ah” sound “round,” which is also the shape of the mouth making that vowel.

When it comes to musical timbre, everything changed when the Fire Nation attacked, and by “the Fire Nation” I mean electricity. Electric amplification and recording allowed humans to arrange sound in ways never before possible. Can you guess what a standard electric current looks like? It starts in “S” and ends in “ine wave.”

Electricity travels via waveform, just like music. Marrying electricity and sound was so obvious, it’s a little weird it took us a few eons to figure it out. But it allowed us to invent multitudes of sounds that rarely or never occur in nature, not to mention effects processing: amplification, distortion, compression, equalization, chorus, phasers, flangers, delay, and, eventually, synthesizers. This last example allows us to shape the electricity first and then just see what sounds come out. This is the opposite of the other, super boring method of blowing through some hollow tree branch and hoping it sounds nice, amirite?

Digital audio synthesis (as opposed to analog synthesis) came not long after the advent of computers. It’s a pure version of sound modeling where things like voltage and the size of the vacuum tube don’t matter. You just punch in numbers and they come out in a very, very rapid stream, mimicking the shape that we already had in much higher fidelity in vinyl grooves.

All this computing power, and we still can’t synthesize anywhere near the beauty of discarded old salmon traps in Alaska.

Timbre possibilities are now practically infinite. While it’s difficult to make music without harmony, it can sort of be done, for example this seminal work entitled ten hours of white static. That was sarcasm. Or was it?

It is semantically impossible to experience music without its timbre because it’s an inherent property of sound. The closest you can come is constructing the most boring timbre possible, like the sine wave, or in a sense using just-intonation techniques to prevent “harmonic beating”, although this simply produces a straight-toned timbre instead. Since timbre is an intrinsic sound property, we won’t have a specific section on it, but I may speak a bit about it in the conclusion.

5. Context, Loudness, Structure

While these elements are considered the secondary musical elements, they’re actually what affects the listener most strongly. The context of music is subjective to the listener, and it’s something contemporary composers often obsess over. It refers to the listener’s subjective experience alongside the auditory event. Context is largely cultural, though some are fairly straightforward, such as whether a particular recording sounds like a live performance or studio-produced. Musical training or lack there-of strongly affects all sorts of things, including emotional experience and attention/distraction. Music is an especially communal experience, making it strongly associated with the average person’s sense of identity. So, music listened to in a crowd setting or with close friends/relatives/loved ones is experienced differently than when alone. We tend to prefer and retain lyrical music better than instrumental music. Introducing language and lyricism to music strongly affects how we perceive the melody and overall piece in general. In other words, context is complicated.

The neurological experience of loudness, also called dynamics, levels, or volume, is surprisingly poorly understood. We have it mapped out pretty well, but hypotheses conflict as to why we like some loud noises, but not others. We don’t know why increasing a song’s volume sounds better until it doesn’t. We don’t know why swelling or subsiding volume affects us so strongly that it can induce chills, an affect that strengthens when coupled with textural changes and surprising harmonies. We don’t know why we can stand louder volumes at a rock concert than at home or in the car, but I like this study’s attempt to explain positive loud experiences, essentially using volume to distract from other stimuli the same way a psychotropic drug might. This theory promotes the idea of sensory spaces, in which one sensory experience can overtake enough of the brain’s attention that it drowns other spaces out. We essentially calibrate loudness to reach a Goldilocks point of optimum distraction and preferable sensory space.

A graph that goes from “Huh?” to “Ow.”

Suffice to say, we do tend to turn up music we like, even when doing so makes no difference in discerning detail of the song – or has any other practical effect besides making more neurons fire. We do this even if it hurts a little (or a lot), even if we know we’re damaging our ears. It’s hard to find good studies on this, since testing loud volumes on your subjects tends to reduce the number of subjects who can hear your subsequent lab tests.

Song structure or form is determined by the combination of meter and harmonic progression, and to a slightly lesser extent the regular return of thematic hooks, or ostinati when being fancy. Classical and chamber music has a multitude of forms, while many contemporary works strive to be formless, called “through-composed.” At the same time, a musical work that rejects discernible melodies or rhythms might rely heavily on its structure to convey a sense of musicality. This interplay of de-emphasizing one musical ingredient, then emphasizing another as substitution will get brought up several times during this series.

In Eurotraditional classical music, forms have names like rondo, sonata, concerto, aria, fugue, and so forth. Indian classical ragas have their own forms, just like Middle Eastern maqams, and etc. Plenty international music styles are largely structureless, like Ghanan Ewe music or chants from North American indigenous peoples.

Interestingly, pop and folk music worldwide often share a similar rigid form known vernacularly as verse-chorus-verse. Current pop music was pioneered by Europe and the US by blues, then rock, then electronic dance and hip hop. It is profoundly, astoundingly mimicked and appropriated in almost every country on the planet. Despite ever-changing trends, however, the form of it is still folky, though songwriters have refined its structure to the following elements.

  • Intro
  • Verse
  • Pre-chorus
  • Chorus
  • (Refrain)
  • Bridge
  • (Breakdown)
  • Outro

Mix those around, repeat them however many times you want, skip some, skip most, it barely matters. As long as it has some sort of verse-chorus-verse structure, it’s probably going to get called a “song.” To be quite frank, a lot of orchestral or symphonic music uses these concepts, too, just better hidden, or in the case of arias not at all. Here’s a fun way to visualize various song structures.

Refrain is something I personally differentiate from chorus, which is based on how I learned to analyze folk music structure. The chorus can be something like a catchy short verse that repeats several times in the song. It can have a new rhyming structure than the song to help differentiate it, for example. A refrain, however, is a single repeated line or phrase. For example, the “Beat It” chorus contains: “No one wants to be defeated / Showin’ how funky and strong is your fight / It doesn’t matter who’s wrong or right,” while the repeated refrain is, “Just beat it, beat it” over and over again.

The current pop structure usually puts the breakdown after the end of the bridge. It’s also related to breaks, which is structurally related to a guitar or drum solo between song sections. You can see examples of the term all over the lyric archive site, genius.com. It all just comes from folk traditions, some singer and an instrument trying to switch things up every 30 seconds or so to keep an audience interested enough to throw a coin in a hat. So, just remember that structure nigh universal, a pop song is a modern folk song, and a folk song’s purpose is to achieve catchy memorability or, in other words, “profit.”

A song’s structure will contribute to determining its length, which can be pretty much a composer wants it to be. A pop or folk song will usually run between 3.5 – 5 minutes, while a computer can perform a single piece for years. This is because humans are constrained by such factors as physical limitations and how much they like to party. It’s also interesting to note that, while songs were growing longer for a while there, as the album format dies and streaming takes over, pop song lengths are now growing increasingly shorter due mainly to Spotify’s monetization policy. See? Even trends happen in waves.

Finally, through-composed music is music in which no sections repeat. This essentially doesn’t exist in pop music, though you’ll find them occasionally as like a B-side or something. Even in those cases, one could argue that if a piece is through-composed, it doesn’t fit the definition of a pop song, even if it happens to be by a composer generally considered a pop artist.

Form is incredibly important, but exists on a larger scale than the brain generally pays attention to viscerally. Thus, its purpose is to perform an interesting dance with the listener’s attention/distraction and anticipation/reward cycle.

Science and the Musical Brain

Each element of music activates brain regions in different ways, and some are interactive while others target more specifically. You can think of the direct sensory input, which begins as vibrating air gets translated to electrical signals via the cochlear system. These signals shoot across the nervous system, next activating the group of processes we call “bottom-up.” This is related to the fact that the brain functions by a constant series of rhythmic neural pulses, which means sending a bunch of auditory-imposed electric pulses directly affects brain function.

Next, the “top-down” processing occurs, which is where the brain rapidly and subconsciously analyzes what it has heard. Finally, this activates neurochemical processes closely tied to the anticipation and reward, which is regulated mostly by dopamine, opiods, and norepinephrine. We are learning that such neurochemicals serve different purposes depending on how, where, and in what amounts they are served during cognition (Chanda Levitin 2013), but such studies are still in their infancy, with new revelations occurring all the time. Mapping the interaction between the different elements of music, the regions of the brain activated, and the neurochemicals involved will be the major focus of the remainder of this series.

By the way, when I discuss brain activation, this refers mostly to the density of blood flow in a specific brain region, though it can sometimes also refer to brainwave activity or specific neurons firing. The type of scan matters, and different technology has different pros and cons as well as imaging resolution. You can learn more about the specific types here.

There are quite a lot of known issue with modern music science. Music experience is highly subjective, and strongly dependent on the tastes, background, and training of the listener. Among other issues, this makes the question of the control difficult and often suspect even in oft-cited research papers. The Chanda/Levitin publication I cited above has this to say about it.

“The lack of standardized methods for musical stimulus selection is a common drawback in the studies we reviewed and a likely contributor to inconsistencies across studies. Music is multidimensional and researchers have categorized it by its arousal properties (relaxing/calming vs stimulating), emotional quality (happy, sad, peaceful) and structural features (e.g., tempo, tonality, pitch range, timbre, rhythmic structure). The vast majority of studies use music selected by individual experimenters based on subjective criteria, such as ‘relaxing’, ‘stimulating’, or ‘pleasant/unpleasant’, and methodological details are rarely provided regarding how such a determination was made. Furthermore, there is often a failure to establish whether the experimenters’ own subjective judgment of these musical stimuli are in accord with participants’ judgments.”

“The Neurochemistry of Music, Chanda, Levitin, 2013

As an example, one study might consider “relaxing music” to be New Age ambient music composed on a synthesizer. A different study might feature something atmospheric by Nusrat Fateh Ali Khan. Still another might use Samuel Barber’s Adagio For Strings. But if the listener dislikes New Age or Diwali music, that will negatively affect results. Also, one “New Age” track or Khan track will sound quite different from the next, adding more uncertainty to results. We don’t even know if there are no vocals, like in that first New Age track, or if there are extensive use of vocalizations, like in the Gabriel/Khan piece. And, finally, if they’ve seen the film Platoon, they’ll have a wildly different (probably depressive) reaction to Adagio For Strings than someone who considers it just a pleasant classical work for strings.

Spoiler alert, people die in a war movie.

Unfortunately, more studies than I care to count will simply say, “relaxing music,” giving no indication of what that actually means, or they may describe the music too vaguely. Same goes for “stimulating music,” which could be anything from hair metal to dubstep. This all adds up to mean scanning technology such as PET, EEG, or fMRI will read very different brain activations from one study to the next, despite the studies claiming similar situations. I will discuss the myth of relaxing music in the next post in this series.

Take another example, the ever-popular drum circle. It’s used in many forms of music therapy with proven positive results [1][2][3][4]. However, no current study exists (that I could find) that looks at the differences between music therapy featuring group drumming and, e.g., group theater therapy. Both have similar positive effects on stress levels, negative thought loops, and addictive or destructive behavior. But a classic drum therapy session has a leader, same as theater therapy, who is guiding everyone in their actions, cracking jokes, generally easing everyone’s mood and getting people working together. It’s a positive collaborative social engagement. So, what’s the difference between drumming and theater? Is there any such difference? Or does it simply need to be any affirmating group activity?

Or, let’s take a new form of therapy called Guided Imagery and Music (GIM) which reliably reduces stress-related neurochemicals. The main study regarding this shows that the treatment is less effective than when the images are shown in silence. However, the study fails to test with only music and no imagery. What if their results were simply the difference between boredom and non-boredom? If they had been in silence, but asked to walk in a slow circle, would that have had similar results? Boredom, too, we will discuss in later posts.

The difficult question often boils down to this: What can music do for the brain that other forms of stimuli cannot?

One thing we can at least address is a more complete standardization to describing the auditory stimuli in research environments when specifying the exact recording or including an MP3 (or similar) is, for whatever reason, not an option. Even when this is possible, the paper should include as specific data in text form as possible regarding the audio stimuli used. While this series will be quite dense in its subject matter, it should all relate, more or less, to better understanding and description of music and its key ingredients.

The Shoulders of Giant Steps

While many of the papers sourced in this and subsequent parts come from a variety of sources, special mention must go to Michael H. Thaut and Daniel Levitin, two major proponents for rigorous, evidence-based neurological music research. Without their efforts, much of the source material in this humble article simply series would not exist, nor would I have had the examples on which to base the overall tone and attempts at best practices. I also plan to share quite a lot of other people’s music, mostly through YouTube or Spotify. If you hear something you like, please consider purchasing their work or subscribing to the relevant streaming channels.


How humans organize sound will inform our method of classification for use in a lab. We’ll see how historical musical can often be categorized by relevant brain region or activity, and how de-emphasizing one aspect of music means other aspects move in to become the primary focus of attention (so to speak). Next, we will be discussing more in-depth the issue of establishing a scientific control in music research.

Thanks for reading! More cake to come. And the cake is not a lie.

About the author


View all posts

1 Comment

Leave a Reply