Who’s More Full of Hot Air, Trump or Biden? A Scientific Analysis

William Jarrold
9 min readOct 22, 2020

Computational analyses of politicians’ speech shows marked differences between candidates.

Summary:

Neurologists have found that spontaneous speech that has a high proportion of pronouns can be a sign of neurodegenerative disorders. Such speech is considered “empty” because if a person’s speech has too many pronouns, speech content can be meandering, nebulous, ambiguous or indirect. Using simple computer algorithms to compare pronoun proportion in transcripts of interviews and debates of politicians such as Trump and Biden I have found that Trump’s speech has a higher proportion of pronouns. This elevation is not a recent phenomenon. At the moment, I consider this finding an interesting curiosity which deserves more in-depth follow-up.

Introduction:

Imagine if there were a way to measure the mental fitness of a politician in a way that was objective, non-partisan and transparent by computationally analyzing their speech?

Likewise, imagine if there were a way to measure the level of ‘bull’ in a politician’s speech in a way that was objective, non-partisan and transparent.

In this essay I explore a recent journey I have taken down this path.

To be sure, we’re a long way off from the future I imagined above. What I submit for your consideration is a glimmer of it, one piece of a mongo-sized puzzle.

The Story Begins

The story begins quite recently: a reporter asked me to comment on a recent research paper about the automatic detection of Alzheimer’s disease from speech. (I have been doing artificial intelligence research and development since the 1990’s building systems that listen (e.g. Hewlett Sanchez et al, 2011) , understand (e.g. Jarrold et al, 2016) and communicate (e.g. Yeh et al, 2014). Looking at biomarkers in speech for different neuropsychiatric conditions is one of my research areas.)

I read the paper and gave the reporter my comments.

And then he asked me an intriguing question:

Can these techniques be applied to a politician’s speech?

Well, I thought, I certainly wouldn’t be the first to do so. For example, there have been a number of researchers who have speech-based indicators of Ronald Reagan’s Alzheimer’s disease going a far back as 1988 (Gottschalk, et al 1988; Berisha et al 2015, Wang et al 2020).

But what did I have in my immediate toolbox to address the reporter’s question?

Pronoun proportion was the first thing I thought of. You will see why I thought that further below, but first, let me explain what pronoun proportion is.

What is Pronoun Proportion?

Pronoun proportion is a measure that scientists have been using for a while. Here is how you measure it: You take a sample of speech, get it transcribed. You read the text and identify all the nouns and pronouns. You count them. Now you are ready to compute pronoun proportion.

Example for those who have forgotten their lessons in grammar: Pronouns are words such as “I”, “her”, “his”, “me”, “my”, “your”, “that”, “this”, “it”, “thing”, “something”, “you” etc. They stand in for a noun or noun phrase that is mentioned in the surrounding context. Consider, for example, this small language sample:

Jane saw the man with the ring. He gave it to her.

That sentence has three nouns (“Jane”, “man”, “ring”) and three pronouns (“He”, “it”, “her”). Thus the pronoun proportion is P / (P + N) = 3 / (3 + 3) = 3 / 6 = 0.5 = 50%.

(Detail: instead of pronoun proportion, some researchers use pronoun ratio — the frequency of pronouns divided by the frequency of nouns. I mildly prefer pronoun proportion because it gives you a number that can be understood as a percent).

Does such counting sound laborious and slow? It absolutely is laborious and slow!! For humans doing it “by hand”, that is. However, thanks to advances in AI in the last 20 years, it is trivially simple for a computer to take a speech sample, convert it to text, and count the nouns and pronouns. As a result we can have a fairly accurate measure of pronoun proportion in about a second.

Why does Pronoun Proportion matter?

Like I said above, when the reporter asked me about how to apply this to political speech, pronoun proportion was the first thing I thought of.

Why? Well, in prior experiments (done by others as well as my colleagues and I — e.g. Jarrold et al 2020, Wilson et al 2010, Fraser et al 2016, Grossman et al 2004, and more) it has been found that a relatively high pronoun ration is often found in the speech of patients with certain neurodegenerative conditions. For example, a disorder known as semantic dementia is known as being especially high in pronouns; it is a type of Primary Progressive Aphasia. In some cases, Alzheimer’s disease has also been found to have a high pronoun frequency.

Why, you may ask, are pronouns higher in these conditions? The best explanation I know of involves what is known as “lexical access”. In particular, as the brain degenerates it loses the ability to retrieve the names of certain nouns. Especially nouns that we don’t say very often. For example, saying the word “kite” might be more difficult than saying a frequent word like “food”. In other words, lexical access deteriorates as the brain becomes impaired.

Next, because they have difficulty recalling such nouns, an afflicted individual compensates. They tend to talk “around” the specifics — they use pronouns they say things like this or that or thing or it in place of naming a particular thing.

The neurologists have used a number of terms to describe such speech; some call it an “empty speech” or circumlocutory. So, I thought, let’s try this. It’s relatively easy to compute thanks to the magic of automatic part-of-speech tagging systems and so I fed a variety of speech samples from Biden and Trump into the system.

One can not make a diagnosis by looking at a single simple measure like pronoun ratio, Diagnosis is a complex decision based on many lines of evidence and I am not a neurologist. Still I thought it might be interesting to see if there were differences.

As I considered applying this metric to Biden versus Trump or to any politician I could see two potential interpretations IF one or the other had higher pronoun proportion:

  • One possibility could be word finding problems possibly based on some sort of neurocognitive condition.
  • Another possibility could be speech that was vague, meandering, circumlocutory or “empty”.

With the above as background let’s see what we find when do the analyses.

Trump versus Biden: Who has more impairment? Who has more hot air?

So far (analyses are ongoing), I have looked at several samples all from this year (2020). Here they are, two interviews for Biden and two for Trump, plus the two debates.

Biden 2020–05–01 Interview: Joe Biden went on MSNBC’s Morning Joe on May 1, 2020

Biden 2020–08–06 Interview: Biden during Joe Biden participated in an interview with the National Association of Black Journalists and the National Association of Hispanic Journalists that aired on August 6.

Trump 2020–07–19 Interview: Chris Wallace of Fox News Sunday interviewed President Donald Trump at the White House.

Trump 2020–08–31 Interview: Donald Trump sat down for an interview with Fox News Laura Ingraham.

First Debate 2020–09–29: Trump and Biden participated in a debate at Case Western Reserve University in Cleveland moderated by Chris Wallace of Fox News.

Last Debate 2020–10–22: Trump and Biden had their last debate at Belmont University in Nashville moderated by NBC’s Kristen Welker.

So, finally, drumroll please, what did we find?

Pronoun proportion for each of the above interviews:

2020–05–01 Biden Interview on Morning Joe: 30.4%

2020–08–06 Biden Interview with Journalist Associations: 28.2%

2020–09–29 Biden in first debate: 33.3%

2020–10–22 Biden in last debate: 34.0%

2020–07–19 Trump Interview with Chris Wallace 37.6%

2020–08–31 Trump Interview with Laura Ingraham: 49.5%

2020–09–29 Trump in first debate: 40.0%

2020–10–22 Trump in last debate: 38.5%

As you can see, all of Trump’s speech samples had a larger pronoun proportion than Biden’s.

So then, I was curious, did Trump have a decline between decades ago and now? If so, this would be consistent with cognitive decline caused by a neurodegenerative condition.

Let’s look at two Howard Stern interviews, first one from about 12 years ago (Jan 8 2008), and another from about 27 years ago:

Trump on Howard Stern 2008 39.3 %

Trump on Howard Stern 1993 39.6 %

As you can see there is no clear increase in pronoun proportion between the much older Howard Stern samples and those from this year. The proportions across time are about the same. It seems very unlikely that he had a neurodegenerative condition going back that far (he was much younger) and so, we are left with the suggestion that Donald Trump has a speaking style that is more “empty”, more circumlocutory or less substantive than Joe Biden’s.

Caveats

To be sure, this work is far from conclusive. I have not tested for any kind of statistical significance. The use of pronoun proportion as a measure of conceptual emptiness is just a hunch — I have not found any work (yet) mentioning this phenomenon in political speech. Although, high pronoun proportion can be one sign of a neurodegenerative disease (when such disease hits the speech areas of the brain as in Primary Progressive Aphasia and some but not all Alzheimer’s) we are at this point miles away from making any kind of diagnostic claim. For example, We do have norms for certain very specific clinical tasks (e.g. having the patient look at a picture of a picnic scene and describing it for a few minutes), I do not know of clinical norms for this kind of spontaneous speech sample. Still it is a provocative curiosity.

Conclusion

In conclusion, I’m not asserting that anybody has a diagnosis of dementia. We don’t even know how stable these ratios are. Sure we are beginning to see a Trend that Trump’s speech has a higher pronoun proportion than Biden. This seems to be a pattern that goes back to the early 1990’s and thus, if it has any meaning, one likely explanation is that his personal style of speaking is less substantive, he talks around issues more, his speech is more “empty”.

Some next steps:

Do we continue to find these patterns in other speech samples, e.g. their debates, other interviews. What kinds of pronoun proportions do we find in other politicians? What kind of differences do we see across different contexts — e.g. interviews, debates, prepared speeches etc? How much variability is there within a speech sample? Do we find statistically significant differences? What happens when we look at other measures of speech complexity (trust me, there are many additional measures such as mean length of utterances, type/token ratio, idea density and more — but pronoun ratio seems to be one of the more novel ones).

Update: 2020–12–17

I have just discovered this interesting analysis of the first Trump vs Biden debate: https://www.oto.ai/blog/ai-analysis-of-voice-data-from-presidential-debate-reveals-secret-trump-advantage. After a quick scan, one of its main findings seems to be that Trump packed more “data” into his speech per second than did Biden. Based on my analysis, that “data” might be merely be a higher volume of words, but those words that are more “empty”. So do you want quality or quantity?

Would you like to hear more?

If so, shoot me an email I will put you on my email list: billjarrold.cl@gmail.com

Thanks,

William Jarrod, PhD

Mind and Brain AI Consulting

References

  • Hewlett Sanchez (et al, 2011)

Sanchez, M. H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., & Jarrold, W. (2011). Using prosodic and spectral features in detecting depression in elderly males. In Twelfth Annual Conference of the International Speech Communication Association. https://www.sri.com/wp-content/uploads/pdf/using_prosodic_and_spectral_features_i.pdf

  • Jarrold et al (2016)

Jarrold, W., Yeh P.Z. (2016). The Social-Emotional Turing Challenge. AI Magazine. Vol 37, №1. pp. 31–38. https://www.aaai.org/ojs/index.php/aimagazine/article/view/2638

  • Yeh et al,(2014)

Peter Z. Yeh, Deepak Ramachandran, Benjamin Douglas, Adwait Ratnaparkhi, William Jarrold, Ronald Provine, Peter F. Patel-Schneider, Stephen Laverty, Nirvana Tikku, Sean Brown, Jeremy Mendel, Adam Emfield. (2015). An End-to-End Conversational Second Screen Application for TV Program Discovery. AI Magazine. Vol 26, №3. pp. 73–89. https://www.aaai.org/ojs/index.php/aimagazine/article/download/2604/2499

  • Gottschalk, et al 1988.

Gottschalk, L. A., Uliana, R., & Gilbert, R. (1988). Presidential candidates and cognitive impairment measured from behavior in campaign debates. Public Administration Review, 613–619. https://www.jstor.org/stable/975762?seq=1

  • Berisha et al 2015.

Berisha, V., Wang, S., LaCross, A., & Liss, J. (2015). Tracking discourse complexity preceding Alzheimer’s disease diagnosis: a case study comparing the press conferences of presidents Ronald Reagan and George Herbert Walker Bush. Journal of Alzheimer’s Disease, 45(3), 959–963. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922000/

  • Wang et al 2020.

Wang, N., Luo, F., Peddagangireddy, V., Subbalakshmi, K. P., & Chandramouli, R. (2020). Personalized Early Stage Alzheimer’s Disease Detection: A Case Study of President Reagan’s Speeches. arXiv preprint arXiv:2005.12385. https://www.aclweb.org/anthology/2020.bionlp-1.14.pdf

Edits: After the original publication on 2020–10–22, I have since added some references. In addition, on 2020–11–02 added pronoun proportions for the debates plus some expository writing edits.

--

--

William Jarrold

I work at the boundary between language, knowledge and data.