Imagining a Better Facebook

Imagining a Better Facebook

Today is February 4, 2014, exactly one decade after a nineteen-year-old Mark Zuckerberg launched thefacebook from his dorm room. Today, the company is worth over $150 billion and has over 1.2 billion active users. I joined Facebook in late 2007. Back then, Facebook was fun. I’d waste hours on the site writing statuses, scrolling through my newsfeed, and messaging friends. But these days, I don’t enjoy using Facebook. Many of my friends don’t like using it either. Although they’re still active Facebook users, most of them find Instagram and Snapchat more enjoyable.

The problem isn’t inherent in the concept of an online social network, but rather in Facebook itself. What would the ideal social network look like? To answer this, I dissected all of the good and bad things about Facebook.

The Good

Everybody is on Facebook

The most compelling reason to use Facebook is because everybody else is on Facebook. It’s rare to have a Facebook search come up empty. These “network effects” make it difficult for any competing social network to displace Facebook.

Chat/Messaging

Email is slow and can get lost in a crowded inbox. Texting isn’t ideal for longer back and forth conversations, has poor support for group conversations, and can be invasive if you don’t know the other person well. Facebook messaging fits nicely between the two.

Facebook groups

I keep in touch with high school friends through a Facebook group, which we use to plan events and let each other know when we’re in town. In college I joined Facebook groups for my majors and various campus organizations, and they were useful for announcements, asking questions, and connecting with like-minded people.

Events

Planning and coordinating events is easy with Facebook. The downside is that people are flaky online (only 30-50 percent of RSVP’d event guests show up), but that’s probably not a Facebook-specific issue.

Photos

Before, if you wanted to share photos online, you’d have to upload them somewhere and email the link to a bunch of people. Facebook allows you to share photos with your friends and family and have them view it in one centralized place.

Profiles

You can gain a basic understanding of a person by looking through their profile and seeing the pages they follow. I’ll browse through friends’ profiles to see if we have similar intellectual interests, or watch the same TV shows. I’ll even learn new things about people I know well from their profiles.

Facebook Pages

Facebook pages, if used correctly, can build communities around shared interests. However, I follow dozens of Facebook pages and can only think of three or four that actually update me with things I care about.

The Bad

Privacy

Facebook’s privacy issues are well documented, so I won’t repeat them here. What bothers me most is that Facebook tracks you around the internet even when you aren’t logged in. And if I forget to log out and accidentally click one of the ubiquitous “like” buttons around the internet, the action shows up on my profile and my friends’ newsfeed.

Irrelevance

The most unpleasant aspect about Facebook is how irrelevant my newsfeed is. I randomly selected posts from my newsfeed and categorized them as “relevant” or “irrelevant,” and only 20 percent fell into the first category. This is abysmally low considering Facebook’s sophisticated algorithms for newsfeed content, my selectivity in accepting friend requests, and my consistent efforts to hide posts from or unfriend people who share things I don’t care about. There’s just too much junk on Facebook, which leads to…

Sharing ad nauseum

Facebook has built up a pervasive culture of sharing. I get spammed with invites from stupid third party apps on the Facebook platform, invites to events that the sender knows I’m not interested in, and Facebook messages from people I don’t want to talk to. My newsfeed is polluted with uninteresting updates from pages I follow and a torrent of photos of pets, food, and babies1. This culture of endless sharing is why Facebook needs newsfeed curation algorithms in the first place.

I had to jump through hoops to make myself invisible to certain people on Facebook chat so they would stop spamming me with messages2. I’ve had to manually unfollow people who post annoying things. I shouldn’t have to keep fighting against my social network like this.

Friendship Expectations

I’ve learned firsthand that people get very upset when you don’t accept their friend requests. Most people I know are Facebook friends with their relatives, coworkers, bosses, and random acquaintances out of social obligation.

Bloated

Facebook is overloaded with useless stuff. Trending topics, hashtags, Facebook gifts, location check-ins, third party apps, graph search, timeline, etc.

The core features are: profiles (though these have become bloated too), newsfeed, photos, chat, events, and groups. Everything else is extraneous.

Notification Overload

These days I hesitate to comment on posts because I’ll be flooded with notifications about unrelated five-word comments from people I don’t know. Facebook has tried to fix this by consolidating notifications and allowing you to unfollow posts, but I’d rather not receive those notifications at all. I no longer stay logged in on the Facebook app because of its push notifications.

The Ideal Social Network

Using this list, we can imagine what the ideal social network (IDS) would look like.

Simple

IDS would be premised around a clean, minimal interface with only essential features, similar to Facebook in the early days. It contains only the essentials: profiles, a newsfeed, photos, chat/messaging, events, and groups. Privacy and security settings are simple and streamlined.

Structured Around Friend Circles

Google+ got it right when they created friendship “circles.” In IDS, every friend must be assigned a circle, such as close friends, family, co-workers, etc, and posts can be targeted to individual circles. Circles give better privacy controls and more accurately represent how we manage our social connections in real life. Facebook awkwardly tried to copy the circle idea with Facebook lists, but few use them since they aren’t a central aspect of the network, as in Google+.

Different Business Model

I’m not opposed to personalized ads. If I have to see ads, I’d rather they be relevant. What is unacceptable is Facebook tracking me around the internet, even when I’m logged out, and giving that data to advertisers. IDS users would be able to opt-out of personalized ads or be able to upgrade to a premium, ad-free version. I wouldn’t mind paying a few dollars a month for an ad-free privacy-respecting social network.

Discourages Extraneous Sharing

To prevent newsfeed pollution, IDS would do the following:

  1. Allow downvoting. The downvotes would be hidden from the original poster, and would be used to make newsfeed curation algorithms more accurate.
  2. Make it easy to unfollow someone or reduce their posts’ prevalence in your newsfeed.
  3. Gamify posts by showing engagement statistics. If people see their posts are consistently not getting likes/upvotes or comments, they may change their posting behavior.

Discourages Meaningless Connections

I see three possible solutions:

  1. Hard limit on the number of friends. Dunbar’s Number suggests we can only maintain 150 relationships at a time, so IDS could limit the number of friends to 300 to add some breathing room.
  2. Don’t have a suggested friends feature. Instead, require each user to make a conscious, specific effort to add someone as a friend. Most users wouldn’t make that effort for people they barely know.
  3. Have a “suggested unfriend” feature, based on frequency of interactions and downvotes.

IDS is the kind of social network I want to build and use. Facebook’s enormous userbase means IDS is probably doomed from the start, but it’s something to think about as Facebook enters its second decade.

 

1- Okay, the baby photos haven’t started yet. Give it a few years.

2- To be invisible on chat for a subset of your Facebook friends, create a separate friend list, add the appropriate people to that list, and change the list settings so they can’t see when you’re logged on.

How to Eat Dinner with Barack Obama

How to Eat Dinner with Barack Obama

When I was a kid, my favorite TV show was The Simpsons1. During the commercial breaks I often saw ads for contests associated with some product, i.e. “Mail in the box top from your bran flakes to be entered in our $10,000 sweepstakes!” While the ads avoided running afoul of any illegal lottery rules by including a “no purchase necessary” clause in the fine print at the end, the hope was that you buy more of their product to be entered in the sweepstakes.

The Obama campaign did something similar during the 2012 election. They ran several “Dinner with Barack” sweepstakes, where entrants could win a personal dinner with President Obama. For every donation you made to the campaign, you were automatically entered into the sweepstakes. At the bottom of the donation page was the familiar “no purchase necessary” fine print, along with a link to enter the competition without donating. I entered myself a few times (who wouldn’t want a chance to speak directly to the President?), tediously reentering the information and submitting the form each time. As I did so I thought, “Why don’t I write a program to do this for me?”

Once I had the idea, I immediately noticed that there was a no CAPTCHA or any other method of preventing a bot from submitting the form. This meant I could write a 20 line Python script to fill out all of the form fields and use a POST request to submit it2. The Obama campaign would obviously be suspicious if the same person or IP address submitted 20 million forms in one second, but the script could be throttled to submit less often, perhaps once every ten seconds. Since these sweepstakes were usually announced about two weeks before the deadline, I could generate:

1 submission per 10 seconds = 6 submissions per minute= 360 submissions per hour =8640 submissions per day = 120,960 submissions in two weeks.

There were several of these competitions, so it’s possible that among them all I could have entered a million times (and perhaps even more, depending on how much script throttling is necessary to avoid detection). And if I could get several computers with different IP addresses submitting entries at the same time, my total entries could be on the order of tens of millions3.

Would this have worked? Although it would have been a lot of fun to try this out, I never implemented the script because of ethical concerns4, and I kept it to myself so others wouldn’t try to use it as well. When I read through the official rules and all the fine print, I found no prohibition against using an automated system. Given that the campaign staff had far more pressing concerns, they may have overlooked this simple exploit. But that is still surprising considering the Obama campaign’s reputation for technical prowess. Perhaps I should tell the President about this over dinner.

 

1 The Simpsons isn’t designed with kids in mind, but the show’s clever wit and gentle satire really resonated with me. As a longtime fan it saddens me to see the show steadily go downhill. The best season was undoubtedly season five, which came out in 1993 (the year I was born!).

2 Writing the script would have taken no more than an hour.

3 To further avoid detection, each instance of the script enter slightly different information, like “Gautam Narula”, “Gautam R Narula” “G. Narula” and multiple (valid) email and mailing addresses.

4 It probably wasn’t illegal, but this exploit definitely exists in the gray area which hackers (in the programming sense of the word, not the steal-your-identity Hollywood usage of the word) spend much of their time in. I felt it was against the spirit of the competition to use a bot to enter myself hundreds of thousands of times in the competition. Then again, maybe that’s why I never ate dinner with President Obama.

 

Building a Multiplayer Elo Rating System

Building a Multiplayer Elo Rating System

Most games today, even well-funded and established games, have terrible ranking systems. For example, tennis’s ATP ranking system is based on “points”, which are awarded by participating in select tournaments in the previous 52 weeks. The system is needlessly complicated and somewhat arbitrary, and does not directly take strength of opposition into account. The upper echelons of Halo and Call of Duty online rankings are, according to a friend, often populated with mediocre players who just play a lot of games, a result of a system that rewards sheer quantity of play over quality of results1. I played high school quizbowl, and the system for ranking the teams was subjective and fairly arbitrary. The top 20 or so teams were fairly accurate because people knew who the best teams were, but beyond that it was guesswork based on unreliable metrics2. This is where Elo ratings come in handy.

The underlying premise of the Elo rating system, first invented to rank competitive chess players, is that the purpose of a rating is to be able to predict the outcome of future games. Philosophically, it was based on a big data approach before big data existed–the only thing that should determine the rating formula is the corpus of games played. With enough prior games, Elo contends, you should be able to calculate the probability of any player beating any other player. The Elo rating system is better than many other ranking systems for several reasons:

  1. It rewards the quality and consistency of results over sheer frequency of playing. Three good results will gain more points than ten mediocre results.
  1. It provides a way to accurately predict the probability of one player beating another. In a two player Elo system, a player rated 100 points higher than his or her opponent has a 64% probability of winning3.
  1. The ratings directly account for the opposition strength. Beating strong opponents gains more points. Losing to weak opponents loses more points.

While chess now uses a modified version of Elo known as Glicko-2, the basic principles still hold.

Developing the Algorithm

My friends and I started getting into the board game Settlers of Catan towards the end of my senior year of high school. We played a few games every week and would get pretty competitive about it. We’d argue over who was the best but had no metric to determine that other than, “well, he seems to win all the time.” After our first semester of college, we all came back home for the break and resumed playing. Inspired by chess ratings, I sought to create a rating system to put an end to the debate once and for all. I spent a day thinking about the various algorithms to use and a night coding an app that I later called the “Game Rating Calculator.” The rest of this post explains the process I went through in creating a rating algorithm, and the results after nearly a year of using it.

When I was developing a rating algorithm for the Game Rating Calculator, I was inspired to create a system modeled after the Elo system. I couldn’t port the Elo formula exactly, since it was designed for only two player games. I looked at a few approaches before settling (pun intended) on the final formula.

There are many variations of the Elo formula4. For the sake of clarity, all references in this post will be to the following formula:

Rnew= Rpre+ K(S-E)

Where Rnew is the new rating, Rpre is the pregame rating, K is the “k factor”, an arbitrary multiplier (a higher k factor means higher rating volatility), S is the total score in a rated event, and E is the total expected score in a rated event, which is calculated by:

formula

where Ropp  is the opponent’s rating.

Assumptions:

I took my base scenario to be a four player game of Settlers of Catan, where all players were of equal strength. Intuitively, each player’s probability of winning the game is 25%. Assume that A came in first, and, if it is possible to have a second, third, and fourth place, then B came in second, C came in third, and D came in fourth. To determine a player’s opposition strength, I take the average rating of all of the opponents. All calculations using an Elo formula are done with a K factor of 16.

Approach #1 – The Pairwise Approach

The pairwise approach tries to get around Elo’s inability to work for more than two players by treating multiplayer games with one winner as a series of pairwise matches. In a four player game with players A, B, C, and D, if A won the game, you would rate the game as the result of three individual matches: A vs B, A vs C, and A vs D. But that leaves us with a problem: A has “played” three games, while B, C, and D have played one. The official online Settlers of Catan website and ranking system attempts to rectify this by taking into account relative positions in the game. That is, the second place player (determined by the number of points at the end of the game) will have “played” three games, losing against the first place player and winning against the third and fourth place players, and so on.

There are several problems with this approach. In our hypothetical case of four equally rated players playing a game, each player should achieve each possible position (first, second, third, and fourth) 25% of the time. In this case, the winning player’s rating was calculated as if he won three games in a row. Since each of his opponents is equally rated, the probability of him winning any individual game is 50% and the probability of winning three games in a row is .5*.5*.5=.125=12.5%. This means the formula acts as if the winner has an accomplished a task that is twice as difficult as it really was, and therefore rewards too many points. Similarly, the last place player will lose too many points, because even though his odds of coming in last were 25%, his rating will be calculated as if he lost three games in a row, which only has a 12.5% chance of occurring.

We can use the binomial theorem to calculate the implied odds for the second and third place players. The probability of winning two out of three games against an equally rated opponent (second place) is 3C2(.5)2(.5)1 = 3*.125= .375 =37.5%. Similarly, for winning one out of three games against an equally rated opponent: 3C1(.5)1(.5)2= 3*.125= .375 =37.5%5.

We can summarize the results in the following table:

Position Actual Probability of Achieving Position Implied Probability of Achieving Position Through Pairwise Ratings Result
First 0.25 0.125 Overrated
Second 0.25 0.375 Underrated
Third 0.25 0.375 Overrated
Fourth 0.25 0.125 Underrated

We can determine the results using a standard elo calculator with a k factor of 32. Let all four players be rated 1000. Their new ratings would be as follows:

First place 1000 → 1048

Second place 1000 → 1016

Third place 1000 → 984

Fourth Place 1000 → 952

One could argue that even though the implied expected scores are off, since the players are all equal in strength, things should “balance out”. That is, in this example, the last place player is expected to come in first as many times as he comes in last, so things should even out. And if he doesn’t then he really deserves the lower rating.

The problem is that this really messes with the predictability with elo ratings. In that one example, there is now a 96 point difference between the first place player and last place player, which is so big that, after just one game, A’s predicted probability of beating D head-to-head had jumped from 50% to 63.5%. Even with a lowered K factor dampening the rating chance, the probability would still be overly optimistic for A. And since Elo uses the predicted probability (expected score) for calculating rating changes, this can mess with subsequent rating changes. And even if all the kinks could be worked out, it seems sloppy to use incorrect probabilities.

From a practical standpoint, this system also has two major problems.

  1. In a game where there is only a winner and multiple losers, this system wouldn’t work. In Settlers of Catan, the player with the second most points at the end of the game is not necessarily the second most likely to win. Games where a player in third or even fourth “leapfrogs” to first and wins the game are not uncommon.
  1. Having ratings dependent on positioning is ripe for manipulation. For instance, if you were were in dead last in a game of Settlers, you could offer to throw the game in favor of another player in exchange for help in moving up from last to place to third. This kind of manipulation is incentivized under this system.

Approach #2 – The Proportional Rating Approach

Elo operates on differences between actual performance and expected performance. As I studied potential solutions to a multiplayer Elo algorithm, I realized that if I could develop a way to determine expected performance in a multiplayer game, I could just plug it into the two player Elo formula. One way of doing this would be to give each player an expected performance equal to the proportion of their rating to the sum of the total ratings. If there were four players with ratings R1, R2, R3, R4, each player’s individual expected performance would be their rating divided by (R1+R2+R3+R4). In the case of our hypothetical example, each player’s expected score would be 1000/4000=.25, which is what our intuition expects.

The one issue with this approach it implies fairly slow improvement. Winning five games in a row is a pretty big deal (after all, in 4 person games the odds of that happening are .255= .09%), but the rating gain would only be 65 points, which means that in a sixth game against 1000 rated opponents the expected probability of winning would only be 1065/4065= 26.2%,when it clearly should be higher.

As for evaluating its accuracy, there was really no way to test if the expected score was reasonable other than to eyeball it for completely inaccurate values. In a four player game with players rated 2000, 1000, 1000, and 1000, the first player has a 40% chance of winning, which seems reasonable for a player who is twice as good as each of his individual opponents. The only problem is that it would have taken so many wins to get to 2000 that the player is almost certainly underrated to begin with.

Approach #3 – The Modified Expected Score Approach

There was one final approach to examine, and this was the simplest of them all. Take our base scenario of four players, each rated 1000. The expected score of each player is .25, while the expected score in a one game, pairwise comparison is .5. In a five player game, the expected score is .20, while the pairwise comparison score is .5. In a six player game, the expected score is .17, while the pairwise comparison score is .5. The expected score for a game of N equally rated players can be generalized as:

2Ep/N

where Ep is the expected score in a pairwise comparison. The key question was, could I expand this formula for the expected score beyond the case of all players being equally rated? I added this multiplier to the Elo formula and tried it out with a few hypothetical games, and the rating adjustments seemed to make sense. But the only true way to establish this was to actually use the system in production. After all, Elo was revised over 40 years of real world usage (though others have found better algorithms). There were two things I really liked about this formula:

  1. It’s point neutral among established players (not counting rounding errors)6 .
  1. It gives a reasonable point adjustment for different size games.

This seemed to be the most promising approach and the algorithm I ended up using. The only issue that might arise is how the ratings would operate in the case of a player rated far above the rest. In a four player game, the theoretical maximum expected score is .5, which means that even if a player were rated one million points higher than each of his opponents, he’d still gain a sizable 16 points from winning. This issue only came up in such unrealistic scenarios, so I wasn’t too worried about it.

Provisional Ratings

The United States Chess Federation has a concept of provisional ratings. For the first 25 rated games, players’ ratings are more volatile7. The idea behind this is that it will more quickly get players to their “true rating”. It also takes into account that moving from casual chess to tournament chess is a disruptive process that has a learning curve, and that a player’s strength will be more volatile as he moves up that curve.

I didn’t include such a system in the rating calculator for a few reasons. First, it is easy to play such a large number of chess games, which makes reaching 25 fairly simple. In a game like Settlers of Catan, where you need to gather at least three (usually more) people for 90+ minutes, this isn’t practical. In the rating pool I set up, only five people had played 25 or more games after 11 months.

There is also a real risk of rating deflation, depending on the number of provisional games. For example, let’s say a provisional rating is twice as volatile as a non-provisional rating. That is, you will gain or lose twice as many points if you are provisionally rated. If there was just one provisional rated game and an average game of Settlers had four players, 75% of the players would have their ratings deflated, since 75% would lose their first game. That 75% would get only half the points they would have otherwise gotten when they win later.

The idea is to tailor the number of provisional games to the point where players are equally likely to get their wins and losses in that set of provisional games as they are to win and lose in general.

The provisional period is five games in my rating pool. At the time, the average game seemed to be around five players. We’ve played many more games since then, and the average has dropped to four players per game. In the first four games, an average player is expected to win one game. But in the fifth game, 75% of the players will lose. In this case, 75% of the incoming players have a deflated rating. Unfortunately, due to a desire for consistency in the rating system, I can’t dynamically adjust the number of provisional players each time. I may retroactively recalculate the ratings based on an average game size of four, since that has held stable for a while.

 Results

It was time to put this rating algorithm into production, and see how it held up. As of this writing, 80 rated games have been played, and a total of 28 players have played at least one rated game. Below are all the results for the players who’ve played enough games to not be provisionally rated (6+).

Player Rating Games Played Average Game Size Winning Percentage* Expected Winning Percentage Net Difference **
K.S. 1073 44 3.82 40.9 26.2 14.7
W.A. 1052 13 4.85 38.5 20.6 17.8
G.N. 1052 38 4.37 31.6 22.9 8.7
A.S. 1028 60 3.72 25 26.9 -1.9
K.S. 1012 41 3.88 31.7 25.8 5.9
I.M. 984 16 4.5 25 22.2 2.8
M.R. 977 7 5 14.3 20 -5.7
A.S. 965 10 4.9 10 20.4 -10.4
G.M. 937 9 4.44 11.1 22.5 -11.4
C.W. 934 6 4.5 0 22.2 -22.2
S.S. 889 29 3.66 13.8 27.4 -13.6
B.S. 859 12 4.25 0 23.5 -23.5

Average: 980.2

Median: 980.5

Average game size of all games played: 3.93

* The Expected Winning Percentage is based on the average game size, assuming all other opponents were equally rated

** The Net Difference may not match Winning Percentage- Expected Winning Percentage due to rounding

At first, I arbitrarily started off all players at 1000. Once there was a decent sized pool, I started off all new players at the median rating of non-provisional players. The median rating has dropped over time, probably because of the deflationary effects of the provisional ratings I mentioned earlier.

Overall, the results look pretty good. The ratings intuitively correspond with how I feel most people should be rated. Importantly, the ratings fit one of the main criteria I wanted in a rating algorithm: it awards the quality and consistency of results, and not just the sheer quantity of games. For instance, W.A. and I are rated exactly the same. He has a much better net winning percentage than I do, but that is compensated by the fact that I’ve maintained a decent percentage for far more games. His strong results are balanced by my consistency. There is one obvious outlier here, and that is A.S. A.S. used to be much lower rated (below 900) but has recently surged with a lot of strong results. One possible explanation for his high ratting and low winning percentage is that he played a lot of games with high rated players. I haven’t calculated each player’s average opponent’s rating, so I can’t say for sure.

I’ll eventually put this all in a SQL database to automate some of the data gathering. But the best way to improve the rating system would be to run data analysis algorithms on the results. How accurately do the ratings predict results? Unfortunately there is still not enough data. Heuristically, I’d think there would have to be a rating pool on the order of at least 25 people who have played at least 25 games to accurately assess the system.

But for now, this algorithm will do. If you’re interested in learning more about the Elo rating system in order to develop your own, check out the book by Elo himself, The Rating of Chess Players: Past and Present.

Tl;dr: Download the desktop version and Android version of the app.

 

1 But then again, that’s what the game makers want. These days games usually have some sort of in-game store or revenue steam, so more play usually means more money.

2 There have been some attempts to institute a chess style elo rating for quizbowl, but the ratings aren’t very accurate since data is sporadic.

3 Assuming you are using the Elo formula mentioned in this post.

4 If you’re interested in learning more about the Elo formula, here is a good treatise about it by Mark Glicko, the chief statistician in charge of the United States Chess Federation implementation of the Elo formula.

5 You could have also intuitively determined this result by noting that the probability distribution for the four possible outcomes were symmetrical, and since winning three in a row and losing three in a row are each .125, the other two possibilities must add to .75, hence .375.

6 It’s interesting to note that Elo system used by the United States Chess Federation and FIDE (the world chess federation) are not point neutral, because they have a K factor that declines as rating goes up. This was apparently used a way to stop rating inflation at the top of the rating pool. Not all chess organizations agree with this—the Internet Chess Club, the premier online chess server, does not use variable K factors outside of provisional games.

7 The volatility decreases within the set of 25 games.


If you’d like to receive my future posts in your inbox, enter your email below:

Enter your email below to receive all of my future posts in your inbox.