Predicting Viral Tweets

The ability to predict the popularity or coolness of a social networking item such as a video, article, or tweet before it actually becomes popular is a very active area of research both in academia and in industry (it is the focus of a number of startups right now). For example, here is a paper that came out in the spring of 2013 (one of the authors is an MIT Sloan person) which looks into ways to predict the ultimate popularity of a tweet on the microblogging site Twitter using information from early on in a tweet's history. (A tweet is a small message less than or equal to 140 ASCII characters in length) The authors looked at popular Twitter users like Kim Kardashian and Newt Gingrich (?) in order to find patterns to build up a probability model of tweet popularity prediction. We won't look at their work in this problem, but it does make good leisure reading if you feel like it.

In Twitter Land, when somebody or something (a bot) posts a new tweet, it can be retweeted (re-posted) by other Twitter users. This retweeted tweet can itself be retweeted by other users. In this way a tweet can act like a virus: as it gets more retweets, it shows up on more users' feeds, increasing its visibility and resulting in even more retweets.
Let's generate a crude model of retweet probability: We will treat the initial tweet and all daughter retweets as one entity (or meme).
Let's say the probability of a tweet gaining an additional retweet (RT=Y) is based on its previous retweet history like so:
| |
Pr(RT=Y|x retweets) | 1-0.97(x+1) |
Pr(RT=N|x retweets) | 0.97(x+1) |
In the table above, x is the number of prior retweets that a tweet meme has experienced. As a result, a fresh, brand new tweet would have a 0.03 probability of being retweeted. A tweet that has already been retweeted once, however, will have a 1-0.972 = 1-0.9409 = 0.0591 probability of being retweeted again.
Answer a few questions about this popularity model of tweets and retweets below. All answers should be accurate to three significant figures. The solution boxes will accept python numerical expressions so feel free to enter your computations into them that way.
Popularity
- What is the probability that a brand new tweet will receive only three retweets?
- What is the probability that a brand new tweet will receive at least five retweets?
- Given that a tweet has already been retweeted five times, what is the probability
that it will be retweeted a sixth time?
- How many retweets must a tweet have received before it becomes more likely than not that it will continue to be retweeted?
- Given that a tweet has already been retweeted 20 times, what is the probability that it will be retweeted at least ten more times?
- Given that a tweet has already been retweeted 100 times, what is the probability that it will be retweeted at least ten more times?
The model above clearly has its shortcomings. It states the likelihood of a brand new tweet receiving 100 retweets is on the order of ~10-22 which is way too small. Starting at larger numbers, however the model may be more useful and offer more accurate probabilities. This is always a good thing to keep in mind: All models are wrong, some are useful.