On Musical Selection In Research: 5. Conclusions

The introduction on timbre really set the tone.

Music research in a scientific context is still in its infancy. Auditory perception beyond mechanical aspects is difficult due to the human element, the metaphorical fixins of our musical cake. While the sensory, bottom-up signal flow of initial oscillation, through the ear, cochlear system, and brainstem is well-mapped, what happens beyond is murky and subjective, relying strongly on the past experiences, preferences, and musical training of the listener (to name a few).

This issue is further confounded by the inconsistent methodology when using musical selections in a research context. Many experiments do not establish a proper control group before extrapolating results, making findings questionable at best. There are at least four levels on which audition occurs, each presenting its own hurdles before achieving rigor. To address all four levels of music experience, research methods should accurately describe all audio samples used not only by providing recordings, but by considering and describing all their aspects as often laid out in an academic musicology setting.

  1. Attention/distraction/meditation – Is there any differentiation between music with little or no discernible pulse or any extreme features (such as loudness), compared to non-auditory stimulus that occupies similar attention in a neural space?
  2. Bottom-up/sensory – The level at which we see the most direct neurological effects such as regulating heartrate, breathing, and motor resonance frequencies via music with a strongly discernible pulse. Deals with rhythmicity and low-pitched frequencies, and possibly harmony as experienced pre-analysis in the brain.
  3. Top-down/cognitive – Short-term memory and responses after the auditory system has provided information to analytical regions of the brain, such as language centers. Related to pitched material especially in vocal ranges, language/lyrical perception, and general musicological analysis.
  4. Long-term/subjective – Learned responses through nostalgia, cultural, skill/training, and other past experiences that can emphasize or diminish the some or all of these experiential levels.

Thus, the full description of an audio sample will include details of all seven aspects discussed in the previous sections. Tradition, genre, instrumentation, and recording context must all be considered where applicable. For each aspect, an inexhaustive list of example data:

  1. Rhythm: Tempo/tempi, strength and discernibility, tradition/genre
  2. Pitch: Melodic, vocal, range, rate of change, language, tradition/genre
  3. Harmonic: Simple vs. complex (e.g. trance vs. jazz), rate of change, key/mode
  4. Timbre: Distorted, harsh, soft, mellow, brassy, sparse, reedy, deep, full, etc forever.
  5. Structure: Short/clip, lengthy/multi-movement, folk, concerto, pop, through-composed, etc.
  6. Context: Cultural, recording (e.g. live/studio), familiar/unfamiliar, vamping vs. none, general personal taste, etc.
  7. Loudness: Depends on experimental factors, but can be objectively differentiated such as in metal vs. smooth jazz.

An example description for the audio sample “Beat It” by Michael Jackson might read:

Dance rock studio recording with a strong, steady drumkit beat at 138 BPM. Popular music structure using repetitive melodic phrasing, high male American English vocals, and simple rock harmonies on electric guitar and bass. Extensive vamping, virtuosic instrumental bridge solo.

That is a thorough description of a five minute sample song. Here’s another of Samuel Barber’s “Adagio For Strings” for orchestra:

Eurotraditional symphonic recording focusing on strings, with a weakly discernible, extremely slow molto adagio pulse. Simple harmonic progression in steady arch form, subtle variations in notated meter, large swells in pitch range and volume. Repetitive, slow-moving melody, no vocals, no percussion, no instrumental solos. Often recognizable from film and television.

By using such rigor when selecting music, we can better standardize music description in research environments. Ideally, a paper will include a link to an online recording of audio materials used, as well data from individuals tested regarding personal preferences and tastes before and after listening experiments are conducted.

As a final note, I want to thank everyone who assisted me while putting together this series. I learned a lot from these deep dives, and at the very least I hope something here inspires readers to go on their own deeper, divier path into the workings of the human brain. Thank you for reading.

On Musical Selection in Research: 4. With The Top Down

Did you hear the one about how harmony is the key to music? That really struck a chord.

If one had to sum up existence in a single word, oscillation would do nicely. Feynman often used the word jiggling, because it’s funnier.

“The world is a dynamic mess of jiggling things if you look at it right. And if you magnify it, you can hardly see anything anymore, because everything is jiggling and they’re all in patterns and they’re all lots of little balls. It’s lucky that we have such a large scale view of everything, and we can see them as things – without having to worry about all these little atoms all the time.”

Richard Feyman, Fun to Imagine, “Rubber Bands,” 1983

You and me and everything we know are merely collections of oscillating particles and waves in various mediums of varying mystery. Our sensory systems detect these jigglings in myriad ways and sensitivities. Comparing, say, olfaction to proprioception would be rather nonsensical, but I think we can make a good case for comparing hearing to everyone’s favorite perceptual process, vision.

Light information is complex, and quite a lot of the brain dedicates itself to perceiving and decoding this data stream, eventually cobbling together a rough approximation of our visual surroundings. While visual information is remarkably important for typical humans, the complexity of this perceptual process means we’re not quite as time-aware of visual information as we’d like to believe. To illustrate this, we’re going to look at visual recreations (movies) vs. sound recreations to see the difference in resolution that tricks the eye into perceiving movement and what it takes to trick the ear into the same thing.

Frame rate, or frames per second (FPS), is a measure of how many still images must be displayed per second to give the impression that they are a smoothly flowing moving scene. At about 10-12 FPS and slower, humans can register each still as individual images[1]. Higher frame rates start appearing as seamless motion, though at first it appears quite choppy. A standard US film has a frame rate of 24 FPS to avoid the choppy/sped-up effect of older films. The faster a frame rate, the smoother the illusion appears, as depicted here:

A great, simple demonstration of four different frame rates.

Perceptual sound information is objectively less complex, and this means we process this data stream more directly through the auditory system. This allows for a more rapid translation of external oscillations into auditory temporal neural signals. People who edit both film and audio actually deal with this in a practical way all the time. Visually, snapping edits to 24 per second is perfectly fine, but that’s only a little over 40 ms, which is enough to sound out of sync to most listeners. To try this yourself, just get a drummer to play a drum machine with a 40 ms latency on the drum sounds. This demonstrates a concept which I have taken to calling:

Cognitive Temporal Resolution

Compare the frame rate video above to the below demonstration of three different audio sampling rates.

Examples of 44.1 kHz, 22.05 kHz, and 11.025 kHz digital audio sampling rates.

Because their oscillations are quicker, and thus closer together, higher frequency instruments like cymbals are a dead giveaway when it comes to sound quality. Sample rate is the number of samples taken per second in a digital audio recording. It looks like this:

Visualization of sampling rates. X-axis is time.

If you’re interested, you can learn more about sampling rate, bitrate, and bit depth here:

You’ll note that we discuss these audio resolutions in the tens of thousands, as opposed to the couple dozen or so required in temporally arranged visual information. Heck, humans can drum roll faster than film frames fly by, and it still sounds like individual drum hits, rather than tricking our brains like a fast visual frame rate does. Or for a bit of fun, take a listen to these accelerating snare hits, which begin at 16th notes on a lazy 60 BPM and steadily accelerate.

808 snare 16th note accelerando from 60 BPM to 999 BPM

First, a rising bass note creeps in, then, toward the end of the clip, the individual snare hits grow indistinguishable. At this point, we perceive them as an audible pitch.


Frequency is a physical attribute, a measure of the number of oscillations within a medium over time. Pitch is the human perception and subsequent analysis of frequency. In English and most cultures, we think of this process as the ability to arrange stimuli along a spectrum of high and low tones.

The average range of a young person’s hearing is roughly 20 – 20,000 Hz. Just a fun fact, the reason standard sampling rate is 44,100 Hz is because it needs to be about double whatever frequency it’s representing to accurately convey one full wave cycle. So, the highest frequency a sampling rate of 44,100 p/s can covey is 22,050 Hz, which, you’ll notice, is higher than the human range of hearing. You can read more about higher sampling rates and the perception of digital audio with virtual instruments for a bit more nuance on that story.

As you age, the top frequency of your hearing range lowers. You can test where you’re currently at here if you have even marginally decent speakers. Bass is actually pretty tough to test at home due to practical factors like speaker size and placement, standing waves, room tuning, etc., but testing higher frequencies is straightforward. I’m 36 years old, and my hearing cuts out somewhere between 16,000 and 16,500 Hz.

Below, you’ll find a chart that shows average spectral ranges for standard Eurotraditional musical instruments. Take things like this with a grain of salt, but it’s a nice visualization nonetheless.

Go here for higher resolution, or here for an alternate interactive version

This is a mixing guide, hence the handy breakdown of ranges along the bottom. Notice also that after around 4 or 5 kHz, humans lose the ability to extract pitch information from sound [1]. So, everything we’ll be talking about for the remainder of this section will deal mostly with sounds that occur below that shelf.

Pitch Perception

I’m going to draw a lot of the following info from the book Pitch: Neural Coding and Perception, which includes not only a wealth of useful information, but also the following list of unanswered questions:

  1. How is phase-locked neural activity transformed into a rate-place representation of pitch?
  2. Where does this transformation take place, and what types of neurons perform the analysis?
  3. Are there separate pitch mechanisms for resolved and unresolved harmonics?
  4. How do the pitch mechanisms interact with the grouping mechanisms so that the output of one influences the processing of the other and vice versa?
  5. How and where is the information about pitch used in object and pattern identification?

A basic flowchart of auditory signal flow.

Pitch: Neural Coding and Perception, Plack, Oxenham, Fay, 2005

Notice all the question marks? This is just another example of how little we truly know about how pitch gets perceived.

That said, here are some things we do know. Frequency is related to loudness, which has been mapped using the equal-loudness contour chart. Low frequencies are processed differently than high frequencies, although the exact mechanism for how this works is still mysterious. This loudness contour is very much in-line with our inherent auditory preference for vocals. Pitch perception involves top-down processing [1]. It’s probably both top-down and bottom-up to some extent. Top-down essentially means “cognitive,” while bottom-up means “perceptual,” although I’m sure plenty scientists would argue with that semantic simplification. You might also consider them to mean “analytic” and “triggered” respectively.


You’ll recall from part one how amorphous and subjective music harmony is as a theoretical concept. This may be why so much research regarding it is rather amorphous and unspecific, and often at odds with each other. See examples of this here, here, and here.

The concept of harmonic structure (especially in Eurotraditional theory) is based on the idea of an interactive tension-resolution cycle between consonance and dissonance. I’m going to go with a pretty book on this phenomenon, Neurobiological Foundations for the Theory of Harmony in Western Tonal Music. Much of the below info is paraphrased or quoted from that text.

Consonance generally means a bunch of tone harmonics line up. This means neurons that are used to firing/vibrating/resonating in sync, do so. The brain initially likes this, because it matches what it often hears in nature.

Dissonance (or roughness) means a bunch of neurons either can’t figure out if they should resonate together or are, somehow, cognitively bumping into each other, so to speak. The brain initially gets annoyed by this.

However, if you play too many sequential harmonies considered consonant by the listener, it often starts to sound boring. As we’ve already covered, brain hate boring. To avoid this static state, we arrange harmonies of varying dissonance/roughness/tension for a while before resolving that tension with a consonant harmony. The most relieving voicing of that chord will usually have the key’s fundamental pitch (the tonic note) in the bass part. This is (probably) because we are cognitively expecting the fundamental due to learned real-world neural clusters. Thus, hearing this resolution engages the neurochemical expectation/reward system. Chord progression (along with volume changes and the entrance of a human voice) has been strongly tied to the chills response or frisson often studied in musical neuroscience, because it’s easy for test subjects to check a yes/no box about whether they have experienced it.

A demonstration of hearing a sounded vs. missing fundamental.

Harmony can be thought of as having both a vertical and horizontal dimension. This is directly analogous to music arranged along a staff. If two or more pitches are played at a time, this is the vertical dimension. Doing this in succession over time is the horizontal dimension. The horizontal dimension involves psychological priming, meaning perception of one stimulus affects how those following it are perceived, and strongly favors top-down processing. The time window over which sound information is integrated in the vertical dimension spans about a hundredth of a second to a few seconds, i.e. from sixteenth notes to tied whole notes at 120 beats per minute. Thus, many minimalist pieces fail to register as true harmonic progressions due to their chord changes falling outside of this perceptual window.

If you’re interested, I really like the City University of Hong Kong’s online resource for auditory neuroscience website, which has an excellent list of articles regarding harmony and pitch as well as resources for other musical elements. This discussion handily leads us into one of the most interesting phenomenons that arise from music perception.

The horizontal dimension deals with both successive chords/tone clusters (harmonic progressions) and individual tone lines (melodies) almost always in the topmost voicing, which leads us nicely into the next pitched musical element.


Melody is the combination of rhythm and pitch, a linear succession of tones perceived as a single entity. A melody as a single unit emerges following top-down processing. You can have a melodic bass line, but due to the aforementioned difference in processing low and high frequencies, a lot of this information is processed rhythmically instead of melodically. Extracting tonal info from purely sub-bass sounds is difficult for the average listener, as opposed to bass tones with lots of higher-frequency information such as would be achieved with bass effects pedals, for example.

The function of melody is closely tied to memory and expectation. You can imagine melodic expectation as a sort of cognitive temporal probability cloud constantly running during active listening. Melodic expectation is probably learned rather than innate and depends on cultural background and musical training. The process for expectancy of harmonic and melodic info is additive, i.e., we compute the rhythmic pitch line and harmonic info together to analyze what psychologists call – and I’m not making this up – a music scene.

Unfortunately, psychologists never specify which music scene.

The multitude of melodic perception/expectation models mostly describe different aspects of the same system. One, called melodic segmentation, is used in computational analyses and uses formulae to automatically separate melodies into small segments, just like the short repeating units in postminimalist music. A well-known researcher Jamshed Bharucha helped pioneer the fascinating concept of melodic anchoring, which describes tone stability and instability based on where the pitch falls in a harmonic context. Repetition helps us commit complete pleasing melodic lines to memory, which is the basis for the concept of the hook used in folk and pop music. This also likely cements how we expect later melodies to progress, creating a sort of taste-feedback-loop.

Rhythmic pulses and harmony form the basis of music like Reich’s Electric Counterpoint mentioned in part three. Later in the same piece, melodic segmentation provides discrete repeating melodic note groups, approaching but not quite arriving at a traditional melody. Now, with the addition of true melodies, we are able to build the majority of music. All that’s left is determining the context and tradition of what we want our music to sound like. This involves choosing the remaining elements of the musical cake: timbre (instrumentation, production, loudness), overall structure/form, and in many cases lyrical content.


This word is loaded and the source of much contention [1][2][3][4]. It is a purely social and semantic method of organizing sound. Like many categorization methods, it is subjective and rife with overlap, gray areas, and fusion practices which then give rise to new definitions that further muddy the waters. But it is directly related to the context or tradition of how music is received and so shall be addressed.

Genre terminology is best used as shorthand for music discussion. “Alternative rock with funk elements” invokes alternative, rock, and funk to quickly communicate what to expect when listening to the Red Hot Chili Peppers. One could could then sonicly associate similar bands like Primus, Living Color, Lenny Kravitz, etc., highlighting the communicative advantages of the system. Genre preference is often a form of sociocultural identity, making it an important aspect of human existence.

Genre preference involves an individual’s relationship to the familiar and the unfamiliar. Musical training/expertise strongly affects a listener’s reaction to unfamiliar music. The more expertise in an individual, the more likely they are to enjoy unfamiliar music likely due to better-rehearsed top-down processing. Music novices are better able to perceive elements of familiar music, which adds to music enjoyment.

Many publications, critics, theorists, and analysts now decry the obsession with genre in popular music award shows. The Music Genome Project is an initiative by the founders of Pandora Radio that automates playlist generation by way of seed association. This means the listener chooses a song, album, artist, or genre which is then examined for attribute keywords that the project considers genre-definers. This is achieved with the help of a team of analysts assigning attributes to individual songs. While the exact list is a trade secret, you can view an approximation of the 450 [1] attributes listed here alphabetically, and here by type. For the purposes of this article, one things stands out:

The vast majority of attributes in the Music Genome Project describe melodic elements.

Regardless of how much we talk about the importance of form and rhythm in music, what really defines how a person receives and associates with music is its pitched content. While rhythmic cadences are vital in genre definition, attributes relating to vocals, lyrics, pace/complexity of harmonic progression, instrumentation, timbre of instrumentation (including things like guitar distortion levels), and so on dominate the list.

The direct mechanical effects of rhythmic auditory stimuli (such as on the motor cortex or cardiovascular system) do not seem to exist with regards to most pitched and harmonic information, with the exception of low frequencies that register in part as perceptually rhythmic. To sum up, melodic/harmonic pitch perception:

  1. Takes longer to develop in humans
  2. Relies more on learned environmental patterns
  3. Displays greater diversity across cultures
  4. Promotes greater subjectivity in the listener

How and why these aspects all tie in together is grounds for exciting research. It also suggests an explanation for why the pentatonic scale in particular is so universal. The fundamental with harmonic overtones matching this scale will be found in naturally occurring tones more often than other spectral architectures, means that these expectant neuron clusters will form in humans regardless of background. Whether there is a deeper or more innate framework for this other than world experience growing up is not yet known.


Musical structure is memory. A list of musical movements/genres is also largely a list of popular structures throughout history. Song structure lends itself quite easily to analysis, which is why I’m not going to rehash it too closely here. The most important thing to know is that composers use regular returns to musical events to ground listeners in familiar territory before deviating again into new content.

Most of the following is paraphrased or directly quoted from Bob Snyder’s excellent book, Music & Memory: An Introduction.

Memory is the ability of neurons to alter the strength and number of their connections to each other in ways that extend over time. Memory consists of three processes: echoic memory and early processing; short-term memory; and long-term memory. This modern hierarchical concept of memory aligns with Stockhausen’s three-level model of timefields. Each of these three memory processes functions on a different time scale, which Snyder then relates to listening as “three levels of musical experience.”

  1. Event fusion level (echoic memory/early processing)
  2. Melodic and rhythmic level (short-term memory)
  3. Formal structure level (long-term memory)

The initial echoic memory sensations decay in less than a second. This experience is not analyzed during this level, but rather exist as a raw, continuous stream of sensory data. Our friend Dr. Bharucha helped define the specialized groups of neurons that extract some acoustic data from this continuous stream, a process called feature extraction. Such features include pitch, overtone structure, and presence of frequency slides. These features are then bound together as coherent auditory events. This information is not the continuous barrage like in echoic memory, meaning the amount of data is greatly reduced. Together, feature extraction and perceptual binding constitute perceptual categorization.

Snyder’s memory model of auditory perception.

After extracted features are bound into events, the information is organized into groupings based on feature similarity and temporal proximity. These can activate long-term memories called conceptual categories. Such memories consist of content not usually in conscious awareness, which must be retrieved from the unconscious. This can take place either in a spontaneous way (recognizing and reminding) or as the result of a conscious effort (active recollecting). However, even when recalled, these memories often remain unconscious and instead form a context for a listener’s current awareness state. This is called semiactivated, meaning they’re neurologically active and can affect consciousness (emotional state, expectation, decision-making, etc.) but are not actually the present focus of cognitive awareness.

If information from a long-term memory becomes fully activated, it becomes the focus of conscious awareness, allowing it to persist as a current short-term memory. If not displaced by new information, these can be held for an average of 3-5 seconds in typical humans. After this time window, it must be repeated or rehearsed internally, i.e. consciously kept/brought back into focus, or it will decay back to long-term memory. The more striking the information in question, the more likely it is to more permanently affect this system by creating new long-term memory information.

There is a constant functional interchange between long- and short- term memory. This is the basis of formal structure in music.

Pitch information is extracted in auditory events that take place in less than 50 milliseconds. This races by as part of the data stream which is not processed consciously.

Events farther apart than 63 milliseconds (16 events per second) constitute the aforementioned melodic and rhythmic level of musical experience. Since these occur within the 3-5 second window of short-term memory, we consider separate events on this timescale as a grouped unit that occur in the present. This time window is essentially a snapshot of consciously perceived time. We parse this musical perception level in two dimensions: melodic grouping according to range similarity, rising/falling motion, and reversals of that motion; and rhythmic grouping according to timing and intensity. Perception information events received within this window are considered by the brain to be available all at once.

Events intervals lasting longer than 5 seconds (roughly, depending on the individual and expertise/training) fall into the category of formal structure. Here, our expectations are manipulated to allow auditory events to fall into unconscious long-term memory. This manipulation activates our limbic reward system, and that feeling is stronger in music we find familiar.

This is how musical structures rely upon the three levels of memory and traditional genre expectations to manipulate the dopaminergic cycle of expectation and reward. Genres/styles/traditions achieve this goal by techniques such as symmetric bar structures, removal and return of repetitive catchy (often sung) melodies/hooks/themes/ostinati, and drastic changes in the volume and presence of vocals and instruments.


Obviously, pitched information is vast and malleable and cannot be truly summarized in a series like this. Many, many books have been written on it, though in terms of cognitive structures much of it remains a mystery. I hope this has at least piqued your interest to some extent. Next will come the final section of this series.

Thanks for reading!

On Musical Selection in Research: 3. Bottoms Up!

Did you hear that existentialist percussion piece? It was highly cymballic.

We have previously discussed arrhythmic music in reference to ambient and minimalist compositions. Other types of music without a pulse include noise, ambient, and drone music. While these vary drastically in terms of context and performance, and especially volume, they tend to be outlier music and thus rarely fall into standard analysis. However, with the background and neuroscientific analytical vocabulary we’re building, such tracks should lend themselves well to description on a case-by-case basis.

So, we come to a fruitful realm of study, namely how the brain responds to rhythm. Life itself happens rhythmically, and some sort of rhythmic action or resonance can be found within the basic functions of all life on earth. Advanced beat perception, prediction, and locomotive entrainment, is a lively area of research. In other words, we’re studying dance.

In scientific parlance, synchronizing motor processes with auditory cues is called entrainment. When you tap your hand along with a metronome, you have entrained to a repetitive auditory stimulus. The word comes from literally stepping onto a train from a stationary platform. To help remember its meaning, I imagined the train cars as a beat clacking by, then stepping onto the train as a way to anticipate, then synchronize with that steady movement.

When musical entrainment goes too far.

Most animals are terrible dancers. Just terrible. But a short list can do it, and in this presentation by Drs. Aniruddh Patel and Ola Ozernov-Palchik, they postulate that vocal mimicry is the key to whether an organism is capable of the kind of temporal perception/anticipation necessary for accurate rhythmic entrainment. Their research shows a strong connection between aptitude at reading, speaking, and rhythmic discrimination in human children. After all, language itself is temporal, and processing the time-information of speech is inherent to understanding what has been said.

Researchers in the last few years were a little surprised to discover that non-human primates are pretty bad at rhythmic entrainment to music. You’d think it’s related to intelligence, but even primates capable of simple drumming take a long time to learn the skill, and tend to always be a little a late. Rather than anything to do with intelligence, it’s due to a lack of neural motor region coupling to auditory regions, meaning these animals can only judge a beat by determining the interval between pulses, instead of predicting when the next beat is going to land like we do.

Since monkeys are born with all their vocalizations intact, they don’t have the vocal mimicking ability we think is required for rhythmic entrainment. So, can you guess which animals are best at it?

When they said “Everybody,” they really meant it.

The most prominent non-human vocal learners in nature are songbirds, which have quite adorably been proven to be able to spontaneously dance to unfamiliar music at varying tempos. An example is Snowball the Dancing Cockatoo, whose above video deservedly went viral a few years ago. Thanks to this video, Snowball caught the attention of the aforementioned Dr. Patel, thus beginning a lively ongoing discussion as to which animals can and can’t perform this feat, and why or why not.

The list of dancing animals is currently humans, songbirds, elephants, and most recently sea lions. Bonobos are an interesting soft exception, which you can read more about in this article, Beasts That Keep the Beat. There’s also a little more here regarding the relationship between motor anticipation, motor learning, temporal perception, and social engagement, including how these factors relate to beat processing in various species.

While musicality has long been associated with neurologic linguistic processes [1][2][3][4], it is by no means the full story. Music activates very deep brain structures, many of which are not related to language in any way we know. But it turns out auditory mimicry might be an inherent part of rhythmic entrainment and, thus, musical development.

The Auditory Brainstem Response

Hearing a steady rhythm sends matching electric pulses through the brain in an amazing phenomenon called the auditory brainstem response. This describes how auditory stimulus directly corresponds to signals in brainstem activity. The relationship is sensory, i.e. bottom-up processing, and it is also, in my opinion, totally awesome. Look at these paired waveforms below.

Auditory brainstem response
Examples of the auditory brainstem response. More examples.

These are audio stimulus vs. the corresponding EEG readings of a human brainstem. For instance, in the pair labeled “Piano Melody,” notice that the electric impulses in the brainstem line up perfectly with the steady pulse of the piano notes. Such synchronized impulses mark the beginning of a vast connection to many centers of the brain, as laid out in this paper by our trusty friend, Dr. Thaut. Our brains function by sending out pulses of electricity via neuronal networks, so this bottom-up effect from rhythmic stimuli directly alters the landscape of said pulses.

After synchronized pulses to auditory stimuli occur in the brainstem, electric signals travel on to regions such as the spinal cord, the subcortex and cortex, strongly interacting with the motor system. The brainstem regulates most of the basic cyclical body functions, such as cardiac and respiratory processes. It’s also pivotal in maintaining consciousness and regulating the sleep cycle [1].

The Chanda/Levitin paper lists several ways we know the tempo of music affects our physiology. As I’ve pointed out is generally the case, they could find no direct neurochemical relationship to “relaxing music,” but they found a strong correlation between pulse tempo and rhythmic bodily processes.

“These effects are largely mediated by tempo: slow music and musical pauses are associated with a decrease in heart rate, respiration, and blood pressure, and faster music with increases in these parameters. This follows given that brainstem neurons tend to fire synchronously with tempo.”

“The Neurochemistry of Music, Chanda, Levitin, 2013

Heart rate and breathing are directly related to our emotional state at a given time, not only with regards to rapidity but also regularity. Unfortunately, many texts related to this often lack rigor, likely related to sources of said papers offering paid services relating to it. However, some highly regarded researchers – including another researcher hero of mine, Bessel van der Kolk – have observed that PTSD is strongly correlated to heart rate and breathing variability, resilience, and coherence.

For more on the symptoms and treatment of PTSD, I strongly recommend this wonderful book, The Body Keeps the Score by Bessel Van Der Kolk

The Chanda/Levitin paper demonstrates that rhythmic musical stimulus directly regulates vital body functions as a bottom-up electric brainstem response. In other words, cultures have been utilizing percussive beats and group drumming to regulate physiological, neurological, and neurochemical stress and trauma responses for many, many years. When the body starts to go into panic mode, rhythmic stimuli can help prevent a learned traumatic response by, basically, removal of neural resources to cause panic, and by brute-forcing neuroelectric pulses that prevent heartrate and respiration to speed up or get out of sync.

Bilateral Stimulation

Drum circles have a long and well-studied history of helping foster positive therapeutic environments, shown to help with at-risk behaviors, depression, anxiety, and addiction to name a few. Group drumming and/or dancing exists in virtually every culture on Earth, long before the advent of modern psychology, and are often considered explicitly therapeutic or cleansing in nature. While we know communal dancing and hand drumming have therapeutic effects for individuals and communities, we don’t know how or even if that mechanism differs from, say, group discussion, feasting, or other communal events. Rhythmic communal cooperation, specifically, is difficult to differentiate from other forms of group therapy.

So, what’s one way rhythmic stimuli differs from others as a form of therapy? Interestingly, a type of therapy called EMDR uses rhythmically alternating bilateral stimulation via eye movements, knee/thigh tapping, or motorized pulsers to impose a constant mild distraction to each hemisphere of the brain. This is done while the subject goes over troubling memories that usually cause a post-traumatic spiral. There is a growing body of sound evidence showing the efficacy of this method, although how and why it works is still not fully understood [1][2][3]. But as it is a rhythm-based therapy, I thought looking a little closer at the exact methodology might be interesting in our neurological beat processing examination.

EMDR stands for “Eye Movement Desensitization Reprocessing.” The most common method used for it in 2019 is with small motorized pulsers held in each hand. This has been shown to reduce the amount of stress a person feels while recounting traumatic memories. Tapping refers to repetitively tapping on the body, usually the knees or thighs, to achieve a similar goal. I spent some time watching various videos of EMDR and tapping sessions, such as this EMDR one from Dr. Jamie Marich and this resource tapping one from Dr. Laura Parnell.

I paid a lot of attention to the tempi used in these demonstrations, clocking the rates as within a narrow range usually hovering around 85-95 BPM. Dr. Marich completes one cycle (back-and-forth) at about 88-90 BPM, while Dr. Parnell is slightly faster, at about 90-94 BPM. The slowest tempo I could find was about 70 BPM, and none went faster than 100 BPM. So the tempo is always slow, which would make perfect sense when trying to keep someone calm. I can only imagine it would be stressful if someone started hammering on your knees at 160 BPM and asked you to recall upsetting memories.

The crux of this technique and how it functions is called bilateral stimulation. This means continually activating both hemispheres of the brain using a left-right alternating pattern. Eye movements are a very easy way to activate different regions of the brain, which is partly/mostly/largely/probably why they happen as we dream in REM sleep [1][2].

Bilateral tactile sensations have a similar effect to moving the eyes back and forth. If you tap on your left leg with your left hand, and your right leg with your right, alternating at a steady pace, it will also alternate hemisphere activation. There is some evidence that this can effectively reduce the stress response as well. All of which is summarized in this quote from Dr. Robert Strickgold:

“We propose that the repetitive redirecting of attention in EMDR induces a neurobiological state, similar to that of REM sleep, which is optimally configured to support the cortical integration of traumatic memories into general semantic networks.”

EMDR: A putative neurobiological mechanism of action, Strickgold, 2001

Which is super cool. The repetitive distraction induces a dreamlike state, is what he’s saying. Contemporary psychotherapists are developing a verified method of neural engagement that denies the brain the resources it needs to drum up (sorry) fearful or anxious thoughts. To reference the loudness paper again, takes up the neural space that might otherwise promote negative arousal.

Many studies discuss the ability of repetitive drumming – whether listening, dancing, or playing – to induce a trance-like state [1][2][3] similar to hypnosis, all of which falls under temporary, non-pathological dissociation as a form of self-regulation. Which is a hell of a sentence.

“Music emerges as a particularly versatile facilitator of dissociative experience because of its semantic ambiguity, portability, and the variety of ways in which it may mediate perception, so facilitating an altered relationship to self and environment.”

An empirical study of normative dissociation in musical and non-musical everyday life experiences, Herbert, 2011

Now, imagine hand drumming in a circle of friends, colleagues, similarly trained musicians, communally associated peers, or whatever. Hand drumming, by definition, is a self-controlled rhythmic bilateral stimulation in which the player’s hands tactically engage in a steady alternating pattern. This regulates the player’s nervous system by entrainment, which is also synchronous with the surrounding players and audience. One can visually confirm this by observing head bobbing, side-to-side swaying, foot-tapping, etc. in surrounding participants, all of which are also bilateral movements. Collaborative human social behavior is deeply rooted in our brain structures [1][2][3], though studies regarding this social collaboration relating to activities like a drum circle are unfortunately lacking. One can at least guess that this process involves temporal anticipation/prediction/discrimination, as well as motor planning and execution, all confirmed visually, aurally, and socially, which cannot be achieved by similar but arrythmic activities.

Bilateral entrainment, Feynman-style.

In short, psychotherapists are already using hidden versions of rhythms as a way to deal with overwhelming stress and trauma – a contemporary refinement of what we were already doing for millennia with social drumming. I personally wonder if nervous rhythmic actions that produce sound, like idle/anxious knee or desk tapping for example, fits into this picture somewhere.

I would be remiss here if I didn’t mention binaural beats, a fun experiment showing how the brain recreates acoustic beating in the brainstem. Separate sound waves, one in each ear of a pair of headphones, combine as electric brainstem pulses to recreate the type of complex wave one would hear in real world, like this:

Two interacting waveforms resulting in a beat pattern.

The two waves are mathematically pitched so that combining them creates a beating sensation not heard by either ear individually. Many find these extremely soothing as a form of sound therapy, and you can read more about them here and listen to a neat demonstration here.

All this suffices to say that rhythmicity in music arises from a deeply innate neural architecture, the so-called glue that binds together the vast majority of human-organized sound. For the purpose of this series, I spend so much time on it to show that there is a true difference between sound, ambience, or noise vs. auditory stimulus with a pulse.

Types of Rhythm

I’ll now switch track to discuss an infinitesimally small pool of examples showing the various ways rhythm arises within music. Rhythm in a sonic work can be explicitly percussive – i.e. in drum ensembles – such as in an Ewe, samba, or taiko ensemble. You’ll also find it in solo pieces this one for Korean jang-go, as well as this contemporary percussion composition in the raga style, Piru Bole. However, rhythm exists much more than percussion music. Rather, it’s a necessary element in anything with a meter or pulse, whether or not it’s actually percussive in timbre. This makes the effect of rhythm on the brain extremely diverse. Let’s try and characterize some examples using the cake method.

In the above example, we hear a rhythm played via rock drum kit solo. It possesses no pitched or harmonic content. The tempo is very fast, accelerating to around 150 BPM, and contains rapid subdivision and snare rolls. The timbre is obviously percussive, utilizing the full range of the kit, creating a wide spectral texture from the bass drum to the cymbal crashes.

Something rather surprising is that no existing scientific study I can find relates brain activity to drum solo listening. The reason is probably that a drum kit is actually a collection of instruments played simultaneously, making it too difficult to extrapolate meaningful data. Instead, researchers tend to look at rhythmic sounds one at time. This presents the problem of deciding which texture to use when studying rhythm. There’s probably a big difference between how the brain receives the same beat pattern depending on whether it’s middle C on a piano, a sine tone, a metronome click, a bass drum, or a crash cymbal. Indeed, simply the volume of a kick drum has an effect on bodily entrainment. This is is why Garth’s solo up there, as it currently stands, is completely outside of the realm of existing research.

Rhythmic content is fast and complex, accelerando to ~150 BPM. No pitched, melodic, tonal, or harmonic content. Standard drumkit timbres with full spectral range. Quite loud. Structure is through-composed freestyle solo, short in overall length. Live improvisatory rock context.

The first ~1:40 of this track is an arpeggiated melodic line generated with a Roland TB-303. Despite having no sounds that are traditionally percussive, the rapid pace, fast-attack wave shapes, and jumps between low and high pitches creative a highly rhythmic, dancey feel without the need for synthesized drums. Which, of course, makes the composition all the more effective when the drums do come in, then go out, then come in again several times.

Rhythmic content is rapid 4-to-the-floor dance beat. Melody is repetitive, fast, and arpeggiated, containing large leaps, making non- or nearly non-singable. Harmonic content is enharmonic slow, soft pads. All timbres are computer generated/synthesized in the style of acid house techno. Dynamics alternate between quieter sections and loud beat sections, which play out over an unusually long runtime of 16 minutes. Club/raver/electronic dance tradition.

The second movement of Ravel’s sole string quartet composition is a really great example of why music selection is so important when conducting research. The beginning section is highly rhythmic and exciting, featuring constant pizzicato techniques and arpeggiation, just like the previous acid techno track. It also counts as a classical composition and employs so called “relaxing” timbres, namely, strings. When a research paper says it used classical music as a relaxing control, what if this piece snuck its way in? Compare this to Adagio for Strings by Barber. If we don’t know the tempo, timbre, or general feel of each individual piece, both might be characterized as similar in a research environment, although I can guarantee both would activate quite differently in the brain. Indeed, this is a perfect example of why each aspect of a musical selection must be indicated in research, because, generally, this track shares major similarities in some ways with the acid techno track, and other major similarities with Adagio for Strings, but all three pieces would be received quite differently by a listener.

Focusing only on the first movement: Highly rhythmic, achieved via constant pizzicato eight, sixteenth, and 32nd notes. Highly melodic, recurring singable theme. Enharmonic key structure with Euroclassical modulations achieved mostly with arpeggiations between players. Timbres are full-spectrum pizzicato and arco strings as in a traditional quartet. Structure, dynamics, and context are variable as is consistent with the Impressionist Eurotraditional artistic epoch.

Percussive melodic instruments such as the marimba demonstrated above, as well as the xylophone, gamelan, or vibraphone (to name only a few) are very pure combinations of rhythm and pitch. They function by producing a tone with a relatively pure timbre (i.e. closer to a sine wave) and seem to be used about as often as metronome clicks when studying beat processing. I can’t find a study that looks at the difference between pitch and unpitched perception along this axis. Which is weird.

Rhythmic content is fast and steady, highly melodic, enharmonic Eurotraditional key structure, wooden malletophonic timbre and spectral range, relatively consistent medium volume, Baroque tradition.

And so, with Steve Reich’s Clapping Music, we finally venture into the realm of postminimalism and phasing music. We’ll get slightly more into what that means in a second. But I want to focus on this particular piece because it’s another really good example of the difficulty of music selection without being very specific. First of all, this is a highly rhythmic piece with no vocals or pitched content. However, it also very clearly has an human element. The brain may react completely differently when it can tell a percussive sound comes from a human rather than a manufactured instrument, in the same way the brain responds completely differently to vocal sounds compared to otherwise similar sound content.

Instead of breaking it down, let’s instead take a look at how different this piece sounds based on the recording, like this fast-paced ensemble version of Clapping Music. It sounds completely different from this version. Or look at Evelyn Glennie performing it on woodblocks, or these super cool jugglers playing it slowly with bouncy balls, or, my favorite, actress Angie Dickinson performing the piece in the 1967 film Point Blank. They’re all the same piece, but some feature extremely different tempi and timbres, likely provoking strong variations in the neurological listening experience. So, even if a researcher actually mentions the name of the composition (which they don’t always do, at all), unless we know which specific recording of that piece is used, we still can’t rely on any data presented.


We began with Cage’s silent music, then added arrhythmic, meditative harmonies (with some naturalistic textures) to build a minimalist composition. Now, by adding a rhythmic element to minimalist practices, we create the genre (or subcategory) known as postminimalism, also sometimes called phasing music.

In most minimalist music, just like the Pisaro piece or any of the ambient works, slow harmonies progress over a long period of time. In postminimalism, the same process occurs but with an added pulse, and usually short snippets of interlacing melodies that phase in and out. The intro section of Steve Reich‘s Electric Counterpoint is a really perfect example of how this compositional technique arises directly from a true minimalist approach.

Performed beautifully here by Mats Bergström.

The introductory choppy chords eventually move into melodic snippets that fade (or phase) in and out, giving sensation of movement while retaining the slow-moving harmonic structure valued by minimalism. However, by avoiding any longform, easily singable melodies (short, quick notes with large leaps) we avoid true ostinati/thematic content like in other genres.

As a note, if you read descriptions in the links, you’ll notice that the term postminimalism is sometimes lumped in with minimalism, though in contemporary music practice this is considered inaccurate, or perhaps just plain old lazy.

I am very personally interested in the difference between rhythm as accorded by something like Electric Counterpoint vs. a West African Ewe ensemble or Garth’s drum solo up there. Both contain strong rhythmicity, but one comes only from pitched/chordal content and the pulse is very clear, i.e. comprised entirely of eighth notes and eighth rests. The others are unpitched and feature far more complexly interlaced syncopation. Would there be some marriage of stimulus in the Reich piece that we wouldn’t see in the unpitched one? Is the strength of response different by the nature of how rhythmicity is achieved in one or the other? Or the spectral range or waveshape of either? Does the harmonic content neurally allow one to be more repetitive than the other before boredom sets in?

Absolutely none of these questions have been addressed in any research I can find. I think it has a lot to do with the fact that the above questions don’t directly relate to rehabilitation of disease/disorder, and are rather purely theoretical. But still, I can’t imagine having that kind of information would tangentially inform therapeutic musical neuroscience research. Anyway.

Every time we add a new element to our cake, it begins to look more like a cake. Cakeness is subjective, but few would look at Electric Counterpoint and deny that it is a piece of music, which some actually might about the Pisaro piece, and many still do with 4’33”.

Now that the overall point of this exercise is becoming more clear, and just because I love postminimalim, I’ll share some examples of postminimalist recordings below, all of which have similar properties. If you were to characterize each using the elements of music previously described, how would they differ between each work? What would be similarly or identically described?

The Chairman Dances, originally from John Adams‘ opera Nixon in China.
The popular Koyaanisqatsi, a visual tone poem collaboration between composer Philip Glass and filmmaker Godfrey Reggio.
Cruel Sister, by Bang on a Can composer Julia Wolfe.


If lazily described, much of the above music might sound identical to one another – “percussive music,” “rhythmic music,” “fast music,” “electronic music,” or “classical music” as examples. However, I hope it’s growing clear just how variable a listener’s experience of each recording might be. Koyaanisqatsi, for example, is often called minimalist because its tempo is so much slower than the other two examples. But this piece contains more musical elements than the Pisaro. It has a strong rhythmic pulse, a wide-ranging spectral character, and far more timbral information, including the game-changing element of human voices. Even without taking individual bias and tastes into consideration (like if someone happened to know and like the film, for example), the two pieces will objectively have a quite different neurological effect on the listener despite a casual description running the risk of lumping them together as the same.

In the next section, we will discuss pitch, melody, and harmony in greater depth, the latter of which will lead to the discussion of structure in music. As this will lead us squarely into the territory of folk and pop music, instead of a parting image, I leave you with a really chill remix of Reich’s Electric Counterpoint by RJD2 from the album Deadringer. Also feel free to enjoy Jonny Greenwood of Radiohead performing another version of the same piece.

Thanks for reading!

On Musical Selection in Research: 2. The Control

Did you hear that minimalist joke? It goes, “Two Irishmen walk past a bar.”

In October 1965, an art journalist named Barbara Rose published an article entitled “ABC Art” in the influential Art In America magazine. In February 2019, I won the award for opening sentence with most instances of “art” in it. These two facts are probably unrelated. Probably.

Her title referred to a term, predating minimalism, which described emerging post-WWII American tendencies to go “back to basics” in creative movements. How artistic movements emerge is necessarily vague, but we can trace the roots of this particular tendency as a response to abstract expressionism (Pollock, Rothko, de Kooning, Kline), as the post-war appropriation of Japanese zen aesthetic, and as an extension of modernist reductivism. It has the distinction of being the first creative movement fueled mostly by Americans, and it influenced not only painting, but film, architecture, poetry, prose, and, of course, music.

More is less, as the saying almost goes, and artists in the fifties and sixties explored what they could remove from their art form and still call it their art form. This wide-scale decision to increase the art world’s negative space makes it a particularly good place to begin, because they were asking exactly the same question I did in part one: What ingredients can we remove from a cake recipe and still end up baking a cake?

As a disclaimer, chronological borders of musical epochs are fuzzy, and exceptions exist in all things. Examples I give in this series, as links or otherwise, are not meant to be comprehensive. They merely provide a starting reference point for those who might want to delve a little deeper on their own time.

We will not be working through movements in chronological order. Instead, we’ll be working from the foundational elements upward. We’ll focus mostly on the last century or so, the first half of which was kind of a genre Wild West which you can read all about on Wikipedia. We’ll start with a revolutionary response to all that 1900s zaniness, the silent elephant in the room, the (arguably) most well-known early experimental musical composition in history: John Cage’s 4’33”.

Silence and Minimalism

Full disclosure. My MFA is from CalArts, a school closely associated with John Cage. Even fuller disclosure, I focused heavily on his works during my first six months of study there, culminating in a performance where I performed some part of all 90 Song Books in a single performance during a celebration of his centenary. To my knowledge, I’m the only person in history to do this, and mention it to display just how interesting I find his body of work.

The Wikipedia article about 4’33” is great. It mentions that, in 1947 when 4’33” was first conceived, the term minimalist composition didn’t exist. Neither did it in 1952, when first performance by David Tudor actually happened. The above-mentioned “ABC Art” term didn’t even happen till 1965. So, some might question the legitimacy of referring to this this piece as minimalist, and that would be a legitimate concern. However, I’d argue that like the poet Ezra Pound, John Cage happened upon the minimalist approach through his independent study of naturalism and zen Buddhism, establishing himself as a bit of a trendsetter. Still, even in 2019, you’ll come across spirited debates in composer’s forums across the globe arguing whether 4’33” rcounts as a musical composition at all, as opposed to a form of experimental theater.

Toward examing that exact assertion, let’s defer to the cake model of the 4+3 musical elements.

  • Rhythm
  • Pitch
  • Harmony
  • Timbre
  • Structure
  • Context
  • Loudness

4’33” has none of these, right? Objectively, that’s incorrect. For starters, many don’t realize the score does indeed have a structure, which looks like this:

John Cage’s 4’33” (In Proportional Notation). Full recreation here.

The lines can be interpreted to mean, “do something here,” and represent a proportion of time within the overall length. Such actions might be something like, flourish a hand, turn a page, open or close the lid, stand up, etc. The total length was inspired by the average length of popular radio songs at the time, a genre which Cage admittedly detested. He came up with the length of each section by using chance procedures, probably something like reading yarrow stalks from the I Ching.

Okay, so, it has a structure. What else? The piece obviously has context – a very deliberate one. It was composed with a direct, emphatic championing of Buddhist ideals, which Cage interpreted as giving more credence to meditative contemplation and stillness than did most music at the time. With a smirk, we might suggest that it does have loudness, namely, none. But in fact, Cage didn’t consider the piece to be about silence at all. Its purpose was to show that true silence is impossible, a concept he found comforting.

In 1951, Cage visited the anechoic chamber at Harvard University. An anechoic chamber is a room designed in such a way that the walls, ceiling and floor absorb all sounds made in the room, rather than reflecting them as echoes. Such a chamber is also externally sound-proofed. Cage entered the chamber expecting to hear silence, but he wrote later, “I heard two sounds, one high and one low. When I described them to the engineer in charge, he informed me that the high one was my nervous system in operation, the low one my blood in circulation.” Cage had gone to a place where he expected total silence, and yet heard sound. “Until I die there will be sounds. And they will continue following my death. One need not fear about the future of music.” The realization as he saw it of the impossibility of silence led to the composition of 4′33″.

John Cage on 4’33”

The piece is a call to perceive the world more closely, a sentiment that many artists found lacking in post-war America. 4’33” became known as a strangely hopeful act of rebellion by the mere audacity of calling itself a musical work. So, to deny its validity as a musical composition is denying the importance of its context on both the immediate and sociocultural level.

The following analysis of David Tudor’s 1952 performance summarizes our conclusion:

“Cage’s piece comprised of four minutes and thirty three seconds of silence by the pianist and his instrument requires engagement of the listener’s perceptual capacities in order to recognize the work as a formal musical composition evolving in real time. Experiencing the piece in silence, the listener’s attention is focused upon the perception of all ambient sounds. This process of attending to the external environmental sonic landscape occurring within a specific time and space yields within the listener a heightened consciousness of perception per se, and in turn, a consciousness of the self as perceiver.”

Biofeedback and the arts: listening as experimental practice, Valdes, Thurtle, 2005

With a notated form and length, and perhaps one of the most influential contextual statements in the last century, we must consider the piece a musical composition – perhaps the bare minimum of what might historically be called a musical work. It is not just the simple score that makes the piece. It is the idea, the man who had it, and the time period in which it was conceived and performed that provides staying power. This is essential to note, because 4’33” was by no means the first or only “silent” musical composition. Thus, in this one particular instance, the power of its context overtakes the lack of other musical elements shared by other silent works. Examples of such contextual importance are actually quite common in the art world in cases like Stravinsky’s Rite of Spring the punk movement, Leonardo’s Mona Lisa, and Duchaump’s Fountain.

Or Duchamp’s Mona Lisa, while we’re at it.

I went into this in such detail because 4’33” is one of the best known examples of removing almost everything about sound organization, while still resulting in a viable musical composition. While it might seem that ambient-sound-focused meditation imposed by a chance-procedures composer is a terrible place to start for neurological music research, it’s actually perfect for a kind of hilarious equivalent. It’s a great conversation about what “context” really means, and it also, quite literally, points out an issue plaguing music research today.

Silence and the Brain

The scientific control is designed to minimize all other variables in an experiment except for the one under scrutiny. For example, the placebo group for a study on new medication is the control group, and an empty platter is the control state for a study that asks the question, “What happens when I combine certain ratios of ingredients in a certain way at a certain heat for a certain amount of time?” If a cake ends up on the platter, we’ve got a great and hopefully delicious result.

Science cake. Hooray!

The control in a sound-attentive experiment presents difficulties for the same phenomenon upon which Cage’s 4’33” shines a light. It is ridiculously difficult to separate sound experience from any experience at all.

“Human audition involves the perception of hundreds of thousands of bits of information received each second. Since one doesn’t have ear-lids, one continues to hear sound even when asleep.”

Ferrington 1994, Schwartz 1973

Cage noted this lack of ear-lids in the anechoic chamber. Perfect silence is impossible for humans with a working auditory system, and because we are always hearing, we run into the issue of how the human auditory interacts with attention. Take the Chanda/Levitin paper’s refutation of the oft-derided “Mozart effect”:

“We also note that the studies reviewed here nearly always lack a suitable control for the music condition to match levels of arousal, attentional engagement, mood state modification, or emotional qualities. In other words, a parsimonious null hypothesis would be that any observed effects are not unique to music, but would be obtained with any number of stimuli that provide stimulation along these axes. Indeed, this was found to be the case with the so-called Mozart effect, which purported to show that intelligence increases after listening to music. The ‘control’ condition in the original study was for subjects to do absolutely nothing. The Mozart effect disappears, however, when control participants are given something to do, virtually anything at all.

“The Neurochemistry of Music, Chanda, Levitin, 2013, emphasis mine.

Another way of describing this effect: Bored brain bad, stimulated brain good. Boredom here is defined as a state of relatively low arousal and
dissatisfaction, which is attributed to an inadequately stimulating situation [1].

A large number of studies involving music could probably achieve similar results with any number of nonmusical stimuli – reading a magazine, watching a movie scene with only dialogue, talking with friends, or mindful meditation as John Cage demonstrated. The reason for this is that all forms of stimulus/nonstimulus experiences fall within some spectrum of cognitive attention. And that relates to a wide-ranging system in the human brain known as the cycle of anticipation and reward.

The full fancy term for this is the mesolimbic dopamine reward cycle. Dopamine itself isn’t a “pleasure” neurochemical; rather, it regulates the perpetual release and reuptake of natural serotonin and opiods in the body in a process called dopaminergic modulation. Lots of fascinating research exists about this, which I’ll mostly skip due to length constraints. But here’s what it boils down to:

The disgust humans feel in response to boredom is universal, as is our appetite for relief from it.

I don’t use the term “disgust” lightly. Take this quote from Peter Toohey’s book Boredom: A Lively History:

Robert Plutchik, writing before this study, maintained that an emotion such as boredom emerges as a derivative or adaptation of this primary emotion of [disease-related] disgust. It serves, in his view, the same adaptive function, though in a milder or more inward-turning manner, as disgust. If disgust protects humans from infection, boredom may protect them from ‘infectious’ social situations: those that are confined, predictable, too samey for one’s sanity. If all of this is true, then it might follow that boredom, like disgust, is good for you – I mean good for your health. Both emotions are evolved responses that protect from ‘disease or harm’.”

Peter Toohey, Boredom: A Lively History, 2011

“Appetite” is also used purposefully. It refers to the technical term describing one part of the human reward system. The appetitive state is the anticipation of a learned pleasing stimulus, leading to future reinforcement of the pleasure response, leading to increased goal-oriented behavior. This is well-studied, with particular regard to the association between anticipation in our most advanced brain structures, the cortex, and our deepest structure, primal, ancient, goal-oriented regions, as thoughtfully explained by Dr. Robert Zatorre.

Because of this highly integrated neural mechanism, music is excellent at provoking a strong reward response.

Consider that, without proper controls, the concept of stimulus itself is problematic. Imagine a bored test subject who suddenly hears a bit of music. How would the researcher know for certain if a study’s resulting data relates to the type of distraction involved, or simply to the mechanism of distraction itself? And again, if the music is relaxing, would the results relate to the effects of relaxing music… or any type of relaxing activity? The research is often vague on this important distinction.

John Cage’s silent piece calls for the audience to perform an act synonymous with sound-oriented mindful meditation, often used as a method for relaxation. Meditation activates the autonomic nervous system centers for attention and control. It can be used to relieve negative feelings such as stress and anxiety by guided attention, whether that attentiveness is directed at one’s breathing, outside stimuli, or an internally conceptualized abstract concept. In a purely contextual composition, with nothing but a bit of visual theatrics and zero notated pitches, rhythms, or harmonies, Cage’s piece might produce results which are indistinguishable from the therapeutic properties of “relaxing music.”

Your next logical question would be, “How much evidence is there to show that relaxing music, in a controlled environment, has a relaxing effect on the human brain?”

The answer may shock you! Click here to learn more!

Just kidding.

The answer is none. Science has never once produced a reliable study that convincingly proves relaxing music, in and of and by itself, reduces stress.

The Myth of “Relaxing” Music

In all my reading for this series, one musician-turned-researcher named Michael Thaut consistently stands out with well-considered and controlled studies. His study examining stress-levels of students under the following three conditions:

  1. Experimenter-selected music
  2. Subject-selected music
  3. Absence of music

The resulting data showed that test subjects achieved relaxation responses in all three categories, even during the silence control session.

“Results indicated that… significant relaxation responses were achieved by the subjects in all three experimental conditions. Neither the presence/absence of music nor the choice of music appeared to make a difference in the relaxation response. The MAACL revealed that depression scores did not change under any of the three conditions, while all subjects reduced their hostility scores regardless of condition.”

The Influence of Subject-Selected versus Experimenter-Chosen Music on Affect, Anxiety, and Relaxation, Thaut, Davis, 1993

You’ll find a similar result in this 1997 study, and this one from 2000, and so on. This has several implications, including that, if a study only uses silence as its control when discussing music, they might accidentally be studying boredom, attention, and distractability instead of the effects of music itself.

Which puts silent meditation, and thus Cage’s 4’33”, in a bit of a “more music than music” situation. Soft, quiet, low-key music can serve to fill in that last bit of background neural ambience without actually being distracting itself – sort of similar to leaving the fan on to help with falling asleep. We can refer to Barry Blesser’s loudness article from part one for its concept of neural spaces. We can close our eyes to filter visual cues, but we must create some sort of low-level stimulus to filter out aural cues from our environment. Blesser’s concept of attention works whether in the extreme or not; i.e., we don’t have to be at a rock concert to observe that white noise effectively helps tune out distracting noises.

We are quite good at tuning various neural spaces to our tastes. We can purposefully use low-level, unobtrusive sound to act as a sort of neural ear-lid, since we lack the handy flesh-flaps our eyes got. At this low level, music is simply one type of cognitive distraction, which we can use to expertuly tune our attention levels to whatever concentrative act we’re performing. This might include falling asleep, tedious office work, or a yoga session.

While death metal yoga sessions do exist, concept is considered humorous at best, proving the point by deliberate contrast. It does look pretty fun, though.

I am in no way asserting that quiet, minimal, or ambient music is identical to white noise or “lesser” music. Indeed, it takes great skill to organize sound in such a minimalistic way while still achieving its intention. I am, however, saying that the relaxative properties of such music might be thought of as expertly tuned cognitive low-level distraction. An interesting research question might be to look into how much longer a minimalist or ambient composition might function to distract someone vs. a box fan or nature sounds or something, especially when considered in relation to the tastes and experiences of individual test subjects.

Describing An Audio Recording

Contemporary minimal music is still meditative and often theatrical in presentation, like this lovely piece by Michael Pisaro:

Asleep, Desert, Choir, Agnes, 2016, by Michael Pisaro with Dog Star Orchestra

This particular work is highly textural. It depends largely on presenting interesting timbres in a slow, steady progression. It has neither clear melodies nor any discernible pulse. Its visual presentation lends itself to the experience both in the geographic arrangement of the performers and the interesting amplified items – many of which are naturalistic. The paper score is visual, meaning it doesn’t have a staff, key, time signature, and little to no formal musical notation. Rather, the performers use timepieces to cue certain actions, a technique utilized by a vast number of contemporary compositions. This work contains a slow harmonic progression, provided in large part by Pisaro’s electric guitar over the course of 30 minutes.

Taking all this information, we can rearrange it more formally using our previously described musical elements:

Rhythm. Arrhythmic, pulseless, slow.

Melody. Nonmelodic, no individual pitch content.

Harmony. Harmony is present in clean electric guitar sound. Moves very slowly. Other sounds occur without relation to an overall harmonic structure.

Timbre. Richly present. While traditional instruments are used, many employ extended techniques or are not played traditionally. Electronics are used only for amplification. No human voice. No computer-generated sounds.

Dynamic. Extremely quiet.

Structure. Present but vague, slow, indiscernible. The recording is over 30 minutes long.

Context. Live acoustic performance. Naturalistic, contemplative, meditative, esoteric, academic.

Combining the above statements into a simple paragraph and including it in a research paper might be a really, really good way to give the reader a thorough explanation of the stimulus used.

The above exercise is in no way meant to summarize the experience of listening to the work, nor should it become some nightmarish form of the Pritchard Scale. I merely suggest a more discrete music classification method in a research than is currently standard.

In Summary

This won’t be the last time I mention minimalism in this series. The Cage piece is simply used to present a sort of analytical control group, the absolute default, the literal minimum by which we can define the term “music.” I also needed to establish an example of contemporary minimalism with the Pisaro piece, because the next part of this series will introduce rhythm into our musical cake. And that means, among other things, post-minimalism. Hooray!

Minimalist rock. Get it?
Thanks for reading!

On Musical Selection in Research: 1. Baking a Cake

A five-part deep dive into music, the brain, and neurological studies of their interaction.

Did you hear about the concert pianist who ate her own sheet music? She said it was a piece of cake.

TL;DR – By breaking down the main elements of human-organized sound, we gain a better understanding of the neurological processes involved in each, which will better inform and predict the selection of musical styles and recordings in research environments. We’ll examine the difficulty of establishing the control condition in auditory experiments, the myth of “relaxing” music, how audition relates to attention and distraction, and top-down vs. bottom-up processing of the musical elements. We’ll also touch briefly on the relation between music and language processing. Finally, we will give several examples of a brief paragraph that more accurately describes music selection using the following seven criteria: rhythm, melody, harmony, timbre, form, loudness, and context.

The Recipe

I was at a dance club in Santa Monica with my friend Lauren, who is not a trained musician. She had just sat in on an music production session, which had left a sour taste in her mouth.

“Why didn’t you like it?” I asked, probably between sips of Guinness because this was ten years ago.

“They were making this track that sounded great,” she answered, “but then they got stuck because they couldn’t ‘find the hook’.”

“And that bothered you?”

“Yeah,” she said, frowning. “I don’t like thinking about music like that. It’s so formulaic, you know?”

A part of me will always agree with this sentiment, but the producers she was with weren’t in the wrong, either. Making music is an amazing feeling when the inspiration juice is flowing, but you also need to know the formula when the creative intuition wears out. Lucky for Lauren and me, our mutual friend DJ Maggie had just started as the opening act, and she had selected an ambient set to get the venue warmed up.

“I do know what you mean,” I said to Lauren. “But, well, take this set Maggie’s playing. It’s atmospheric and relaxing, right? Don’t you think it makes for a great opening set?”

Lauren warmly agreed.

“This music doesn’t have a hook, either,” I said, exactly as clearly and succinctly as I’ve recalled it now. “No one thinks music always needs a hook. But a hook – something repetitive and catchy – serves a purpose for a song with a certain goal. Without a hook, it’s a different kind of song.”

Lauren frowned again. “But there’s nothing wrong with that.”

“No, it’s not wrong or right, it’s just… It’s like baking a cake,” I said, coming up with a metaphor I’ve used many times since. “The standard cake recipe is butter, sugar, eggs, and flour, right? But you can bake a cake out of all sorts of things that aren’t those four ingredients. It’s just that the more you subtract and substitute from those four things, the less the final product looks and tastes like a cake.”

I call it, “A Waltz in Pound Cake.”

I love this metaphor because, just like cake, there are four standard ingredients of music: rhythm, pitch, harmony, and timbre. There are a few more subjectively experienced ones as well, such as loudness, context, and structure. Young composers and songwriters often struggle the most with structure, which is probably why form tends to be so rigid in a given era. The contemporary era has seen an interesting twist on this tradition, because pop and folk music are the lucrative styles, which has caused resistance to rigid structures in many trained chamber composers. Even for untrained musicians like Lauren, stripping music down to its elements feels wrong, because it takes away the magic. They believe (or worry) that it will subtract from the experience of music.

Here’s a Richard Feynman quote about that.

“I have a friend who’s an artist and has sometimes taken a view which I don’t agree with very well. You hold up a flower and say, “Look how beautiful it is!” And I’ll agree. And he says, “You see, I, as an artist, can see how beautiful this is, but you, as a scientist, take this all apart and it becomes this dull thing.” And I think he’s kind of nutty… Science knowledge only adds to the excitement, the mystery, and the awe of a flower. It only adds! I don’t understand how it subtracts.”

Feynman, 1981 interview for his book, The Pleasure of Finding Things Out.

Do bakers enjoy cake less the more they learn? Based on the greatest show of all time, The Great British Bake-Off, I’m going to say, absolutely. Do comedians still laugh at jokes? Yes, they just laugh at less of them. Learning the formula changes one’s tastes by refining critique and making surprise more difficult to achieve, but in the end it only enriches enjoyment. Case in point, Lauren gleefully exclaimed that she got it, that the cake explanation of musical formula helped a lot, and then we danced our butts off because that’s what friends do over Guinness at dance clubs in Santa Monica.


  1. Baking A Cake – In this introductory post, I’ll go over definitions of terms for each element of music.
  2. The Control – In part two, I’ll discuss issues with researching music and science, with a focus on how hard it it to set up reliable control conditions.
  3. Bottoms Up – Part three will focus on the “bottom-up” perception of rhythm and low frequencies in the brain.
  4. With the Top Down – Part four will discuss the “top-down” process of experiencing pitch, melodic, and harmonic audition.
  5. Conclusions drawn from this journey into music and the brain.

Defining the building blocks of human-organized sound has been done many times before, but I hope to distinguish myself by taking a neurological, evidence-based approach to the subject. With modern advances in EEG, PET, and fMRI scanning, scientists are making great strides in mapping what regions of the brain are activated when perceiving and performing musical tasks. We’ll talk about music in both chamber and folk/pop traditions. Some styles discussed will include minimalist, ambient, noise, drone, algorithmic, electronic beat, and postminimalist music. Maybe also a bit of the ol’ Baroque, Classic, Romantic epochs too, though probably not as often since that’s been worked to death already.

An inexhaustive list of words often used to define the elements of music:

  1. Pulse, feel, meter, rhythm
  2. Frequency, tone, pitch, melody
  3. Chord, scale, key, harmony
  4. Timbre, texture, color
  5. Miscellany – Context, Loudness, Structure

Because you have to know the rules before you break them! Actually, you can break all sorts of rules without knowing them. It’s the easiest way to break a rule, to be honest.

The purpose of this is mostly to piss off some really close grad school friends who loathe music analysis, for my own sake as I consider the possibility of doctoral studies, and also to determine possible paths toward more reliable and reproducible results in neurological music therapy applications.

As a final note, I’ll be using Eurotraditional to refer to the standard classical and chamber music repertoire from Europe, as opposed to Western, with an emphatic shoutout to the brilliant Natalie Wynn’s for her insightful, entertaining, mildly NSFWish ContraPoints video presentation regarding the term.

1. Pulse, Feel, Meter, Rhythm

“It’s interesting. I’ve known quite a few good athletes that can’t begin to play a beat on the drum set. Most team sport is about the smooth fluidity of hand-eye coordination and physical grace, where drumming is much more about splitting all those things up.”

Neil Peart, author and former drummer of Rush

In musical parlance, beat has many definitions. If you ask, “What’s the beat of this song?” you’re asking about the tempo, a.k.a. Beats Per Minute (BPM). If you say, “I love this beat,” you’re usually referring to a rhythm of around 2 to 4 bars long which generally includes 8-16 eighth, quarter, or (rarely) half notes. If you say, “I love Beat It,” you’re referring to a popular Michael Jackson song. See? It’s complicated.

Tempo and pulse are roughly the same concept. For example, the song “Beat It” has a pretty fast pulse at 138 BPM.

Feel is the basic of unit of rhythm, comprised entirely of two simple numbers: 2 and 3, a.k.a. duple and triple meters. “Beat It” is in duple, because the strong-weak cycle lasts two eighth notes instead of three. Making matters worse, and by that I mean way better, you can play both at the same time. Every hip hop song on the planet seems to do this right now, using vocal triplet structures to add rhythmic syncopation over the classic 4/4 meter. OutKast uses triple meters in their beats fairly often, in the decades-long tradition of Southern Swing, which is in 6/8 or 12/8 meter.

And it sounds GREAT.

Meter is the hierarchical grouping of pulses based on feel, such as the most common one in Eurotraditional music, the trusty, endlessly symmetric duple 4/4. That’s four quarter note downbeats to a bar, also called a measure. This definition also covers Indonesian last-beats, but doesn’t quite cover West African bell patterns. Patterns itself best describes that music, which has a pulse but little or no true meter.

A bar or measure can best be defined as full units of beat patterns, for example a couple bars that go something like, kick SNARE kick CLAP kick SNARE hat hat kickCLAP. That was “Beat It” in case that wasn’t clear.

Finally, I will define the umbrella term rhythm with the help of CalArts composer, conductor, and music theory professor Marc Lowenstein:

“Rhythm is best described as something like ‘the relative times between changes in sound.’ It might sound dorky, but the precision is important. It is not pulse dependent and is not hierarchically important. All three of those terms do inform each other, but they are distinct.”

Marc Lowenstein

In other words, one of the ways humans organize sound is via time, and rhythm is the perception of those changes relative to each other. Rhythmicity is one of the most intuitive ways humans distinguish music from not-music. There are no records whatsoever of folk music traditions that lack a pulse arranged by regular strong and weak beats. Even modern attempts at pulseless music often have an implied one somewhere, though whether it’s perceived as one depends on many factors.

So it comes as no surprise that playing, listening to, and especially dancing to a drum beat activates more areas of the brain than any of the other three major music ingredients. In a rock recording studio, the drums get more attention and microphones than any other instrument, and roughly half of the available decibels in a standard mix go toward the drums, while the rest of the band gets shoehorned into the remaining half.

Our brains don’t just like rhythm. They are rhythmic. Brainwaves are real, they can be trained, and are constantly manipulated by all sorts of practices from the new age to the mundane. Most of our motor functions happen in time to rhythmic brain pulses, an incredible phenomenon called motor resonance. An adult human walks on average at a pace of about 2 hertz, or two steps per second. If you divide 2 Hz into four beats, you get a pulse of 120 beats per minute, which is such a common musical tempo that most audio editing software has it set as default (Ableton Live, for example). It’s also the most common march tempo. You’ll find some ratio of 2 hertz in impromptu finger tapping and a long list of routine motor actions (Leman 2016, shout out to JS Matthis 2013). Neurologically, it’s the default tempo because it’s roughly in the middle of the spectrum that humans discern as rhythmic vs. arrhythmic, i.e. too fast or slow to perceive as a beat. Too slow and we start to subdivide it, too fast and it becomes a tone.

Music without a pulse with a duple or triple feel arranged in a predictable pattern probably didn’t exist until modern times. It’s still pretty hard to pull off for the sole reason that humans aggressively impose patterns on literally everything. It’s kind of our thing. But contemporary composers now attempt works that remove rhythm entirely or make it unpredictable as to be unperceivable, with varying levels of success.

One reason it’s so hard to do away with rhythm is human locomotion. When the average person’s brain hears a strong backbeat, such as in “Beat It,” the motion control strongly activates with electrical impulses from the cochlear system. I mean this literally – neurons in the brainstem and motor cortex fire in time to the beat (Thaut, 2006, 2014) when listening to steadily pulsing rhythms. This is part of the bottom-up model of audition, meaning we experience it as sensory instead of as a higher-tier cognitive process. In short, the brain dances to its own innate rhythm, and synchronizing our bodies and external stimuli with it is a very pleasant experience.

The human brain, constantly.

A few examples of music that might ignore or de-emphasize rhythmicity are drone, ambient, noise, and true minimalist music. You’ll also find music that has rhythmic elements but no meter, essentially changing at a pace that prevents the perception of an established pattern or pulse. That will often involve electronics, especially algorithmic music or modular synthesis, though some improvisational styles attempt to achieve this as well. As previously stated, West African drumming often has multilayered patterns, but no true meter – only a feel in duple or triple time. There are endless examples of solo music in which the performer employs wildly variable tempi to manipulate the tension-release cycle. This is often called expressive or subjective timing, examples of which are found in the following performances:

Later, we’ll go more in depth regarding exceptions and substitutions, but first, we’ll need to define the remainder of our terms. In other words, we need to learn how to bake a standard cake before we venture into vegan gluten-free low-carb quantum cake territory.

2. Frequency, Tone, Pitch, Melody

“Music creates order out of chaos: rhythm imposes unanimity upon the divergent, melody imposes continuity upon the disjointed, and harmony imposes compatibility upon the incongruous.”

Yehudi Menuhin, violinist, Symboles Dans la Vie Et Dans L’art, Leit, Whalley, 1987

In the beginning, there was a sine wave, and it looked like this:

*Waves* Hiiii!

All you need to know is this: When the peak in the red line points upward, the particles in a medium are dense, all smushed together. When the bump points down and forms a trough, the particles are sparse and spread apart. Do this enough times, and you’ve built a universe.

The sine wave is simple, beautiful, perfect math. Rotate it and you can draw circles or spirals. It looks cool in 3D, too. Pass it through the electromagnetic field, and you get pretty colors and X-rays and stuff, also known as light. Stack them on top of each other and you get neat shapes like triangles, sawtooths, and squares. Because math!

Maybe the coolest GIF about Fourier Transforms ever.

And if you pass some sine waves through air or wood or brass or taut animal skins or whatever, you get sounds.

Just like human eyes can’t see X-rays or radio waves, our ears can’t detect every frequency. Which ones you can hear (if any) is up to genetics, hearing damage, and age, since as you get older you hear less high-pitched tones. But, generally, you start out hearing around 20 Hz to 20,000 Hz, which completely coincidentally is wavelengths of about 17 meters to 17 millimeters long. All of which leads us to the question, “If a tree falls in the woods, and it only emits frequencies below 15 Hz, does it make a sound?”

The trees have answered.

To put it simply, sine waves are everything. Detecting them is how we perceive the world around us, but also how we respond to said stimuli. This exchange is at the heart of musical interaction. We essentially turned the framework of our existence into a really cute game by generating tones in rapid succession at mathematically related frequencies. Although, usually, we just call it humming a melody.

We don’t know what came first, singing or talking (or maybe whistling?), but we know both went hand-in-hand during hominid evolution. The first instruments were probably drums or xylophones lost to time, but the earliest evidence of artistic expression are flutes made from bones and horns. Flutes, incidentally, make the closest approximation to a clean sine wave out of all symphonic instruments. Our discovery of such instruments predates any known early visual art by thousands of years (Wallin et al 2001). We made simple instruments that produced soundwaves with simple math, and can assume singing was a precursor to this, making it developmentally along the same timeline as language development.

Below you’ll find a picture of the Overtone Series (also called the Harmonic Series). You can imagine this picture as a series of combined mathematic concepts, but you can also imagine it as a series of pictures of vibrating guitar strings, violin strings, etc.

The Harmonic Overtone Series

The top one labeled 0-1 is the fundamental tone, i.e. the lowest pitched tone, i.e. a representation of a perfectly vibrating string without any nodes (like when you hold down a string against a guitar fret). The successive ratios refer to how the sound wave is allowed to vibrate, in other words, more math. If you’d like to know a little more about the basics of this, I cannot recommend the following 10-minute video enough.

The harmonic series does a lot more than help us arrive at pitch, melody, and scale, because it’s also a model for how we perceive harmonies and timbres. To oversimplify it a bit, the more complex the ratio of a partial from the series is, the more dissonant we perceive a tone in a scale or chord.

The simplest partials, i.e. the first few in the series, make up the pentatonic, or five-note, scale, which is practically second nature to human musical perception. If an untrained musician is asked to improvise a melody, it will usually be pentatonic if it’s in tune. Almost all humans internalize pentatonic melodies quite easily, which is probably due to the fact it occurs naturally all around us. We’ll get more into that later.

We’re coded, at least as infants, to prefer higher-register melodies (Trainor 1998). Voices perceived as another gender than the listener garners more attention (Junger 2013). There’s not a lot of existing research about voice and register with regards to age and gender distinctions, unfortunately. But what we have shows that, to be judged a melody, it must generally exist in the human vocal range, making it likely that melodies help humanize a passage of music and thus tying together all the other elements. Melody is sort of the cognitive glue of a traditionally arranged song – the hook, if you will, that makes it memorable and personal.

Singers, especially untrained ones, often inadvertently damage their voices over time (or sometimes on purpose) just to achieve a signature “sound.” This is vocal timbre is important to us on many levels of social interaction, and we associate all sorts of interpersonal assumptions with vocal timbre. Song creators capitalize on melodic attention by writing the catchiest melody they can think of and playing it repeatedly throughout a song. Quite often, getting a woman to sing said melody in a sexually enticing way evokes a more memorable reaction in the listener. These methods of neural manipulation are quite lucrative, which is part of the formula against which my friend Lauren rebelled. Sexualization in popular music is an ongoing saga fraught with both empowerment and exploitation, and many books exist on the subject. Suffice to say, I’ll just link to Women and Popular Music by Sheila Whiteley and leave it to the reader to pursue further if desired.

Ravel’s Boléro notwithstanding.

Likely because of its ubiquity, examples of music styles that attempt to avoid folky melodies are vast – noise music, ambient drones or chords, and minimalism, for example. Serialist composition technique attempts to subvert the composer’s tendency toward melody by using rigid rules that produce difficult-to-sing tone rows, to the great frustration and/or excitement of contemporary voice students everywhere. Although the reality is that most students can sing by their sophomore year what seasoned professionals used to call impossible. Which is a kind of human progress that could be called something like expertise normalization, but apparently it isn’t. If you know the technical term for this, please let me know, because as a concept it’s apparently very hard to Google search and should be a way bigger thing.

Anyway, to sum up, in spite of all our best efforts, humans are annoyingly adept at singing along to pretty much anything.

3. Chord, Scale, Key, Harmony

“How sweet the moonlight sleeps upon this bank!
Here we will sit, and let the sounds of music
Creep in our ears; soft stillness, and the night
Become the touches of sweet harmony.”

The Merchant of Venice

Harmony is, objectively, the hardest element to define out of these four ingredients, because… Well, here’s a pretty good attempt from William Malm’s book Music Cultures of the Pacific, the Near East, and Asia:

“Harmony considers the process by which the composition of individual sounds, or superpositions of sounds, is analyzed by hearing. Usually, this means simultaneously occurring frequencies, pitches (tones, notes), or chords.”

Almost anyone can recall a melody or make one up on the spot. If you ask someone to recreate their favorite multiple-bar drum solo, many will fail, but some, at least, will succeed. However, asking someone to recite their favorite harmonic resolution means they’ll actually need to sing a melody implying a harmony (unless you’re Anna-Maria Hefele, but that’s another story).

Harmony can take on an almost mystical quality. We can help standardize the definition with a distinction: harmony is a perceived simultaneous combination of pitches, while an implied harmony is the type that arises from an unaccompanied melody or similar. Together, these terms summarize our perception or understanding of an arrangement of harmonic consonance and dissonance, i.e. the tension and release cycle of notes and chords.

Harmonics can refer to a variety of phenomena, including suppressing the fundamental note to get a higher-pitched tone from an instrument, or it can more generally refer to multiple waveforms/oscillations/whatever occurring simultaneously in a scientific context. The musical meaning predates the scientific one, so there.

When defined as a basic musical element, harmony isn’t specifically the bass line or top notes, and it’s not the chords played on keys by the left hand or strummed on an acoustic guitar. That’s the accompaniment implying a harmonic progression. If I sing an unaccompanied melody, no one’s playing any chords, but the tune can still be analyzed harmonically. An organ fugue rarely contains much block chord voicing, but it will still move through harmonic regions that arise from accidentals, interlocking melodies, and pedal tones. So that’s why chords are not the same thing as harmony.

A key is a mode or scale with one pitch made hierarchically the most-stable by an asymmetrical, contextual process. In Eurotraditional music, the key is a shorthand indicating what notes, accidentals, and chords to play. The “most-stable” pitch often refers to the tonic, such as the note “C” in a C major scale. Some alternate forms of organizing tonal content are ajnas, maqams, and dastgah from the Middle East, thaat in Indian raga, pathet from Indonesian gamelan, and so on. Scale, key, and mode are often used interchangeably in English, but this can get murky when you consider international interpretations of musical pitch organization or performance environments. Here are iterative definitions for each:

  • Mode – a limited collection of discrete pitches.
  • Scale – an ordered mode.
  • Key – a mode or scale with a hierarchically most-stable pitch.

Sew, a needle pulling thread. Contemporary compositions often lack a traditional harmonic structure, but the cycle of tension and release is so inherent to humans that it’s quite difficult to avoid entirely. For example, many works don’t use easily analyzed chords, but tone clusters – like those created when mashing your hands on a piano – still contain harmonic information, even if it’s just that one hand’s cluster contains higher-pitched notes than the other.

Note that the high/low description of pitch is learned, not inherent. Examples of pitch characterizations from other cultures are thin/thick and light/heavy. Though we can all intuit the other spectrums, musicans who grew up with different characterizations have no predisposed association in the practical sense. I will be using the high/low spectrum of pitch in this series, because why not.

Composers have spent a lot of time in the last century creating complexly organized frameworks that defy classic tonal analysis. It’s almost more common in academic music to reject key signatures, a practice referred to with the umbrella term atonality. But even those pieces often contain moments upon which the listener might impose a harmonically interpreted tension-release cycle. Harmony – implied or not – just happens, whether you like it or not.

4. Timbre, texture, color

“I sometimes wish taste wasn’t ever an issue, and the sounds of instruments or synths could be judged solely on their colour and timbre. Judged by what it did to your ears, rather than what its historical use reminds you of.”

Jonny Greenwood, composer, guitarist for Radiohead

Firstly, it rhymes with amber. Secondly, the story of music performance is timbre. Thirdly, the story of recorded music is still timbre, just more of it.

Here were the available musical timbres for a few million years:

  • Blowing through a wooden tube
  • Blowing through a wooden tube over a wooden reed
  • Blowing through a metal tube
  • Sticking giant metal tubes in a wall and blowing through those
  • Blowing through a meat tube over a meat reed, also called singing
  • Plucking taut strings over some gourd-shaped cavity
  • Sawing taut strings over more taut strings over a gourd-shaped cavity
  • Making taut animal skins resonate over some gourd-shaped cavity
  • Hitting animal skins with sticks
  • Hitting sticks with other, possibly different sticks
  • And, of course, this thing:
I mean, there are always exceptions.

“Instruments possess different timbres” is a fancy way of saying a trumpet doesn’t sound like a kettle drum. Different resonator shape and material means different partials in the overtone series get emphasized. That’s largely where the signature sound of a class of musical instruments comes from.

Timbre isn’t the difference between high and low notes. That’s pitch, i.e. tonality. Timbre is the quality of the sound. If I sing “ah” with my mouth wide open and a lot of breath support, that’s a very round, smooth timbre. If I sing through my nose and do my best Munchkins of Oz impression, that’s a sharp, nasal timbre. The “ah” color is going to look more like a sine wave, while the sharper timbre is literally a sharper shape, as shown the comparison of three instruments’ soundwaves linked above. Er, and also linked just now. You can see more about vowel timbres, their descriptions, and what their waveforms looks like here.

This is a good place to mention that, in the neural realm of sound, the human vocal timbre is quite special. We have specific regions and processes dedicated to identifying the source of a speaker or singer, including their gender, age, emotional state, status, personality, arousal, and attractiveness. [1][2][3][4][5][6][7]. This builds directly upon the discussion about melody and pitch perception. In other words, the timbre of the singer’s voice can greatly influence how the pitch content is received, providing context via color and personality. Not only that, vocal qualities inform a lot about how we experience and describe sound, for example “ah” sound “round,” which is also the shape of the mouth making that vowel.

When it comes to musical timbre, everything changed when the Fire Nation attacked, and by “the Fire Nation” I mean electricity. Electric amplification and recording allowed humans to arrange sound in ways never before possible. Can you guess what a standard electric current looks like? It starts in “S” and ends in “ine wave.”

Electricity travels via waveform, just like music. Marrying electricity and sound was so obvious, it’s a little weird it took us a few eons to figure it out. But it allowed us to invent multitudes of sounds that rarely or never occur in nature, not to mention effects processing: amplification, distortion, compression, equalization, chorus, phasers, flangers, delay, and, eventually, synthesizers. This last example allows us to shape the electricity first and then just see what sounds come out. This is the opposite of the other, super boring method of blowing through some hollow tree branch and hoping it sounds nice, amirite?

Digital audio synthesis (as opposed to analog synthesis) came not long after the advent of computers. It’s a pure version of sound modeling where things like voltage and the size of the vacuum tube don’t matter. You just punch in numbers and they come out in a very, very rapid stream, mimicking the shape that we already had in much higher fidelity in vinyl grooves.

All this computing power, and we still can’t synthesize anywhere near the beauty of discarded old salmon traps in Alaska.

Timbre possibilities are now practically infinite. While it’s difficult to make music without harmony, it can sort of be done, for example this seminal work entitled ten hours of white static. That was sarcasm. Or was it?

It is semantically impossible to experience music without its timbre because it’s an inherent property of sound. The closest you can come is constructing the most boring timbre possible, like the sine wave, or in a sense using just-intonation techniques to prevent “harmonic beating”, although this simply produces a straight-toned timbre instead. Since timbre is an intrinsic sound property, we won’t have a specific section on it, but I may speak a bit about it in the conclusion.

5. Context, Loudness, Structure

While these elements are considered the secondary musical elements, they’re actually what affects the listener most strongly. The context of music is subjective to the listener, and it’s something contemporary composers often obsess over. It refers to the listener’s subjective experience alongside the auditory event. Context is largely cultural, though some are fairly straightforward, such as whether a particular recording sounds like a live performance or studio-produced. Musical training or lack there-of strongly affects all sorts of things, including emotional experience and attention/distraction. Music is an especially communal experience, making it strongly associated with the average person’s sense of identity. So, music listened to in a crowd setting or with close friends/relatives/loved ones is experienced differently than when alone. We tend to prefer and retain lyrical music better than instrumental music. Introducing language and lyricism to music strongly affects how we perceive the melody and overall piece in general. In other words, context is complicated.

The neurological experience of loudness, also called dynamics, levels, or volume, is surprisingly poorly understood. We have it mapped out pretty well, but hypotheses conflict as to why we like some loud noises, but not others. We don’t know why increasing a song’s volume sounds better until it doesn’t. We don’t know why swelling or subsiding volume affects us so strongly that it can induce chills, an affect that strengthens when coupled with textural changes and surprising harmonies. We don’t know why we can stand louder volumes at a rock concert than at home or in the car, but I like this study’s attempt to explain positive loud experiences, essentially using volume to distract from other stimuli the same way a psychotropic drug might. This theory promotes the idea of sensory spaces, in which one sensory experience can overtake enough of the brain’s attention that it drowns other spaces out. We essentially calibrate loudness to reach a Goldilocks point of optimum distraction and preferable sensory space.

A graph that goes from “Huh?” to “Ow.”

Suffice to say, we do tend to turn up music we like, even when doing so makes no difference in discerning detail of the song – or has any other practical effect besides making more neurons fire. We do this even if it hurts a little (or a lot), even if we know we’re damaging our ears. It’s hard to find good studies on this, since testing loud volumes on your subjects tends to reduce the number of subjects who can hear your subsequent lab tests.

Song structure or form is determined by the combination of meter and harmonic progression, and to a slightly lesser extent the regular return of thematic hooks, or ostinati when being fancy. Classical and chamber music has a multitude of forms, while many contemporary works strive to be formless, called “through-composed.” At the same time, a musical work that rejects discernible melodies or rhythms might rely heavily on its structure to convey a sense of musicality. This interplay of de-emphasizing one musical ingredient, then emphasizing another as substitution will get brought up several times during this series.

In Eurotraditional classical music, forms have names like rondo, sonata, concerto, aria, fugue, and so forth. Indian classical ragas have their own forms, just like Middle Eastern maqams, and etc. Plenty international music styles are largely structureless, like Ghanan Ewe music or chants from North American indigenous peoples.

Interestingly, pop and folk music worldwide often share a similar rigid form known vernacularly as verse-chorus-verse. Current pop music was pioneered by Europe and the US by blues, then rock, then electronic dance and hip hop. It is profoundly, astoundingly mimicked and appropriated in almost every country on the planet. Despite ever-changing trends, however, the form of it is still folky, though songwriters have refined its structure to the following elements.

  • Intro
  • Verse
  • Pre-chorus
  • Chorus
  • (Refrain)
  • Bridge
  • (Breakdown)
  • Outro

Mix those around, repeat them however many times you want, skip some, skip most, it barely matters. As long as it has some sort of verse-chorus-verse structure, it’s probably going to get called a “song.” To be quite frank, a lot of orchestral or symphonic music uses these concepts, too, just better hidden, or in the case of arias not at all. Here’s a fun way to visualize various song structures.

Refrain is something I personally differentiate from chorus, which is based on how I learned to analyze folk music structure. The chorus can be something like a catchy short verse that repeats several times in the song. It can have a new rhyming structure than the song to help differentiate it, for example. A refrain, however, is a single repeated line or phrase. For example, the “Beat It” chorus contains: “No one wants to be defeated / Showin’ how funky and strong is your fight / It doesn’t matter who’s wrong or right,” while the repeated refrain is, “Just beat it, beat it” over and over again.

The current pop structure usually puts the breakdown after the end of the bridge. It’s also related to breaks, which is structurally related to a guitar or drum solo between song sections. You can see examples of the term all over the lyric archive site, genius.com. It all just comes from folk traditions, some singer and an instrument trying to switch things up every 30 seconds or so to keep an audience interested enough to throw a coin in a hat. So, just remember that structure nigh universal, a pop song is a modern folk song, and a folk song’s purpose is to achieve catchy memorability or, in other words, “profit.”

A song’s structure will contribute to determining its length, which can be pretty much a composer wants it to be. A pop or folk song will usually run between 3.5 – 5 minutes, while a computer can perform a single piece for years. This is because humans are constrained by such factors as physical limitations and how much they like to party. It’s also interesting to note that, while songs were growing longer for a while there, as the album format dies and streaming takes over, pop song lengths are now growing increasingly shorter due mainly to Spotify’s monetization policy. See? Even trends happen in waves.

Finally, through-composed music is music in which no sections repeat. This essentially doesn’t exist in pop music, though you’ll find them occasionally as like a B-side or something. Even in those cases, one could argue that if a piece is through-composed, it doesn’t fit the definition of a pop song, even if it happens to be by a composer generally considered a pop artist.

Form is incredibly important, but exists on a larger scale than the brain generally pays attention to viscerally. Thus, its purpose is to perform an interesting dance with the listener’s attention/distraction and anticipation/reward cycle.

Science and the Musical Brain

Each element of music activates brain regions in different ways, and some are interactive while others target more specifically. You can think of the direct sensory input, which begins as vibrating air gets translated to electrical signals via the cochlear system. These signals shoot across the nervous system, next activating the group of processes we call “bottom-up.” This is related to the fact that the brain functions by a constant series of rhythmic neural pulses, which means sending a bunch of auditory-imposed electric pulses directly affects brain function.

Next, the “top-down” processing occurs, which is where the brain rapidly and subconsciously analyzes what it has heard. Finally, this activates neurochemical processes closely tied to the anticipation and reward, which is regulated mostly by dopamine, opiods, and norepinephrine. We are learning that such neurochemicals serve different purposes depending on how, where, and in what amounts they are served during cognition (Chanda Levitin 2013), but such studies are still in their infancy, with new revelations occurring all the time. Mapping the interaction between the different elements of music, the regions of the brain activated, and the neurochemicals involved will be the major focus of the remainder of this series.

By the way, when I discuss brain activation, this refers mostly to the density of blood flow in a specific brain region, though it can sometimes also refer to brainwave activity or specific neurons firing. The type of scan matters, and different technology has different pros and cons as well as imaging resolution. You can learn more about the specific types here.

There are quite a lot of known issue with modern music science. Music experience is highly subjective, and strongly dependent on the tastes, background, and training of the listener. Among other issues, this makes the question of the control difficult and often suspect even in oft-cited research papers. The Chanda/Levitin publication I cited above has this to say about it.

“The lack of standardized methods for musical stimulus selection is a common drawback in the studies we reviewed and a likely contributor to inconsistencies across studies. Music is multidimensional and researchers have categorized it by its arousal properties (relaxing/calming vs stimulating), emotional quality (happy, sad, peaceful) and structural features (e.g., tempo, tonality, pitch range, timbre, rhythmic structure). The vast majority of studies use music selected by individual experimenters based on subjective criteria, such as ‘relaxing’, ‘stimulating’, or ‘pleasant/unpleasant’, and methodological details are rarely provided regarding how such a determination was made. Furthermore, there is often a failure to establish whether the experimenters’ own subjective judgment of these musical stimuli are in accord with participants’ judgments.”

“The Neurochemistry of Music, Chanda, Levitin, 2013

As an example, one study might consider “relaxing music” to be New Age ambient music composed on a synthesizer. A different study might feature something atmospheric by Nusrat Fateh Ali Khan. Still another might use Samuel Barber’s Adagio For Strings. But if the listener dislikes New Age or Diwali music, that will negatively affect results. Also, one “New Age” track or Khan track will sound quite different from the next, adding more uncertainty to results. We don’t even know if there are no vocals, like in that first New Age track, or if there are extensive use of vocalizations, like in the Gabriel/Khan piece. And, finally, if they’ve seen the film Platoon, they’ll have a wildly different (probably depressive) reaction to Adagio For Strings than someone who considers it just a pleasant classical work for strings.

Spoiler alert, people die in a war movie.

Unfortunately, more studies than I care to count will simply say, “relaxing music,” giving no indication of what that actually means, or they may describe the music too vaguely. Same goes for “stimulating music,” which could be anything from hair metal to dubstep. This all adds up to mean scanning technology such as PET, EEG, or fMRI will read very different brain activations from one study to the next, despite the studies claiming similar situations. I will discuss the myth of relaxing music in the next post in this series.

Take another example, the ever-popular drum circle. It’s used in many forms of music therapy with proven positive results [1][2][3][4]. However, no current study exists (that I could find) that looks at the differences between music therapy featuring group drumming and, e.g., group theater therapy. Both have similar positive effects on stress levels, negative thought loops, and addictive or destructive behavior. But a classic drum therapy session has a leader, same as theater therapy, who is guiding everyone in their actions, cracking jokes, generally easing everyone’s mood and getting people working together. It’s a positive collaborative social engagement. So, what’s the difference between drumming and theater? Is there any such difference? Or does it simply need to be any affirmating group activity?

Or, let’s take a new form of therapy called Guided Imagery and Music (GIM) which reliably reduces stress-related neurochemicals. The main study regarding this shows that the treatment is less effective than when the images are shown in silence. However, the study fails to test with only music and no imagery. What if their results were simply the difference between boredom and non-boredom? If they had been in silence, but asked to walk in a slow circle, would that have had similar results? Boredom, too, we will discuss in later posts.

The difficult question often boils down to this: What can music do for the brain that other forms of stimuli cannot?

One thing we can at least address is a more complete standardization to describing the auditory stimuli in research environments when specifying the exact recording or including an MP3 (or similar) is, for whatever reason, not an option. Even when this is possible, the paper should include as specific data in text form as possible regarding the audio stimuli used. While this series will be quite dense in its subject matter, it should all relate, more or less, to better understanding and description of music and its key ingredients.

The Shoulders of Giant Steps

While many of the papers sourced in this and subsequent parts come from a variety of sources, special mention must go to Michael H. Thaut and Daniel Levitin, two major proponents for rigorous, evidence-based neurological music research. Without their efforts, much of the source material in this humble article simply series would not exist, nor would I have had the examples on which to base the overall tone and attempts at best practices. I also plan to share quite a lot of other people’s music, mostly through YouTube or Spotify. If you hear something you like, please consider purchasing their work or subscribing to the relevant streaming channels.


How humans organize sound will inform our method of classification for use in a lab. We’ll see how historical musical can often be categorized by relevant brain region or activity, and how de-emphasizing one aspect of music means other aspects move in to become the primary focus of attention (so to speak). Next, we will be discussing more in-depth the issue of establishing a scientific control in music research.

Thanks for reading! More cake to come. And the cake is not a lie.