This is a Markiplier.


Markiplier has a YouTube Channel.

YT screenshot 1

Markiplier knows this many words.

Markiplier - 1 - Unique Word Count

Some words are just additions of other words.

a-hole, b-street, c-sharp.

This means that Markiplier knows less words than before. (Thanks YouTube Subtitles)

Markiplier knows good words and bad words. Most of them are good words.

Markiplier - 2 - Unique Word Count, Colored By Sentiment

But Markiplier says bad words a lot.

4 - Unique, Bad Word Count

And Markiplier says meaner bad words than good ones.

Markiplier - 5 - Unique, Both Word Count1

Markiplier also knows more mean words than nice words.

6 - Unique, Both Word Count2

But that’s ok. We still enjoy Markiplier =)

What did I just read?

So this is the first in the YouTube series. What’s going on is that I got interested in the repository of personality that a few have started to create. The most dedicated content creators put literally hundreds of hours of themselves online, all of which is ripe with data containing their personality. So, I made a pipeline to scrape the subtitles that are automatically generated from YouTube and parsed through them with my handy-dandy python to see what insight we could gain about these people.

Obviously for this post I’m talking about Markiplier. =/

You know, this guy:


And while I’ll dig deeper into their language usage situationally, I thought it would be good to get an understanding on how to interpret who Markiplier is based on an overview of his language.

Markiplier - 1 - Unique Word Count

Markplier talks to himself quite a bit. He generally has a solo show but there are times when he has a guest on. Sadly Youtube’s subtitle generator is pretty shitty, so the count of unique words is a bit skewed with shitty word generation like “it’sgood” and “yeahright”. This can be in-part thanks to the guests making it seem like two words are happening at once. Luckily in the future if we compare this word count to other people’s, the inflation amount should be quite similar because the same word concatenation will exist.

Markiplier - 2 - Unique Word Count, Colored By Sentiment

Of the words that Markiplier says, many of them have an emotion behind them. Talking about how good someone is and how pretty something looks has a positive connotation and talking about how you want to kill somebody or how terrible you look has a negative connotation. This is typically called the Sentiment of the word. Negative Sentiment for negative connotation, Positive Sentiment for positive. The range for these values is typically from 1-5 and -1 – -5 for positive and negative sentiment, respectively.

Markiplier - 6 - Unique, Both Word Count2
Number of Distinct Bad Words vs Distinct Good Words

Mark doesn’t know as many kind words vs the unkind words. He actually knows quite a bit more words that are unkind than kind. somewhere in the ballpark of almost 50% more unkind words actually. Knowing anything about his channel, this is probably due to mainly playing horror and frustrating games or games with friends where they are typically verbally roughhousing, it makes it easy to come up with just horrendous things to say to one another.

Markiplier - 5 - Unique, Both Word Count1
Number of Distinct Words that Markiplier has used. Left – Distinct Positive Words Right – Distinct Negative Words

The unkind words that he does know are typically more hurtful than the words that he does know, but only barely.

Markiplier - 5.5 Word Count
Count of the total words Markiplier has used. Left is total amount of words with positive sentiment. Right is total amount of words with negative sentiment.

But of the words that Mark says, he generally uses words with kinder Sentiment than his abundance of unkind words that he knows. I’m guessing he saves those for his friends. The bad words… not the good ones. That’s probably why he has such a large following with almost 9 million subscribers, because in general Markiplier is a generally positive guy, with a few moments (or many) of anger.


