Eight hundred million Facebook profiles will soon get Timeline.
Over the coming week, Facebook will officially roll out the latest addition to its world-spanning social network: a Timeline that maps out each profile as a series of chronological events. And for the average user, it will appear as if by magic.
But Timeline, Facebook’s biggest interface change in recent memory, is the end result of a six-month effort not only to create a new piece of software, but to find a way of quickly serving that software to an audience of 800 million.
Serkan Piantino — the man who will run Facebook’s New York engineering office, set to open in 2012 — oversaw this sweeping project, and as the company prepared to launch the new interface, he sat down with Wired at Facebook’s Palo Alto, California, headquarters to give us the timeline for his Timeline.
It all started with a hack called Memories. This spring, Facebook held one of its famous hackathons — a caffeine-fueled, all-night coding bender where the only rule is that you can’t work on anything you’d normally work on during the day — and Memories was one of the slap-dash software creations that appeared the next morning.
The goal of these hackathons is to get developers thinking about a new type of application that can really hook Facebook’s users, and Memories had potential. The idea was simple: It called up Facebook photos from a particular year or other date range. The company tested it quietly among a few members, and it quickly caught on. People enjoyed going back in time to peruse, well, themselves.
New applications tend evolve organically inside the company. Like Google, Facebook still clings to a kind of startup mentality. Piantino and a handful of engineers started to develop the Memories idea on their own, and eventually, the project gained enough steam that CEO Mark Zuckerberg kicked it up the priority list and management gave Piantino 15 to 20 of Facebook’s top engineers, pulled from various teams across the company. “There are things that Zuck always thinks are core to Facebook and things that we have to get right. News Feed was one of them,” Piantino says. “A new profile always gets priority.”
The goal was to provide each Facebook user with a rapid-fire summary of all the best things that happened to them over the course of a year. On a site of Facebook’s size, this sort of thing quickly gets complicated. The company stores over a trillion rows of indexed data, covering status updates and other “events” — and this doesn’t include photos and additional data that turns up on the site. Depending on your number of friends and their activity, a click to your News Feed can call up as many as 10,000 stories that account for about 8 megabytes of space. And according to Piantino, all this occurs in about seven-tenths of a second.
He and his team knew that to keep up with Facebook’s meteoric growth and the success of News Feed, they had provide similar speeds with Timeline.
Memories Doesn’t Use Memory
The trouble is that Timeline must dig far deeper for information. News Feed is primarily a game of memory management. It’s looking for recent information. But Timeline looks into the past.
Timeline data is recorded on disks, not memory. The average hard disk spins at about 10,000 rpm, and moving the head across the disk takes about 5 milliseconds. But when you have hundreds of millions of users accessing data across tens of thousands of machines, management gets very complicated, very quickly. Those milliseconds add up, and so do the costs.
To keep the service fast, Facebook moved to a system that allows it to fetch the data for each Timeline reload with a single seek of the disk. In other words, all Timeline information for a particular user is stored on one disk “stripe” — or at least that’s the goal. “We had to go out on a limb with the idea of how we were going to build this,” Piantino said, remembering his initial meeting with Facebook’s capacity team — the employees who actually buy and deploy the servers.
Facebook management wanted Timeline rolled out quickly enough that Piantino and team had to make a some sweeping assumptions about its requirements — and they couldn’t be wrong. Piantino deemed this meeting the “forcing function” of Timeline’s creation. Piantino likens Timeline’s hardware setup to a jet engine, not necessarily because of its speed, but because it’s designed to do one task extremely well.
In describing Timeline’s software infrastructure, Piantino will only go so far. But he does say that one of the keys to the system was that Facebook locates the aggregation code — the stuff that sorts through a user’s Timeline information — on the same machine as the data. “If you can ship your aggregation code to the box itself, that’s easier than using a network link,” he says. “We’re using the CPU for aggregation and the disks and input-output system for MySQL.”
Yes, Timeline uses MySQL, not Hadoop Hbase (which Facebook with others parts of its site) or some other NoSQL database. Whereas NoSQL databases are meant to spread vast amounts of unstructured data across vast array of machines, MySQL is relational database designed to organize data in neat rows and columns on a single machine. But MySQL can be “sharded” across many machines, and that’s what Facebook does.
“A lot of people are surprised that for this shiny new thing for Facebook, we’re using MySQL,” Piantino says. “We treat [MySQL] as a generic engine for data manipulation. We use it as a storage engine. And it’s really efficient.”
In 2008, Piantino saw a presentation by an engineer from InnoDB — an outfit that does storage engines for MySQL. He remembers thinking that if he was ever trying to solve the problem of finding data on disk, there “wasn’t a chance” he’d come up with a better way than the engine InnoDB had built for MySQL.
Piantino points out that Timeline fundamentally deals with ordered data — where ordering is its most important quality. The connection to other Facebook “events” is secondary. This is different from “graphed data,” which lets you quickly traverse different kinds of information — from comments on a picture to geo-location. News Feed is a graphed product. Timeline is a log product.
The proof is in the pudding. According to Piantino, Timeline streams 5 megabytes off disk and aggregates in about 120 milliseconds.
Timeline “dark launched” in July. This means that for the last five months, every time someone clicked on their profile, Facebook not only accessed its existing databases, it opened the Timeline databases for writes as well. In essence, Timeline was running under the covers, so the team could monitor loads, update code, check for bugs, and, yes, start storing the data.
To release Timeline, they decided to buck Facebook’s usual practice of releasing half-baked features or betas. At Facebook, the profile is sacred, so they ignored one of the company’s unofficial mottos: “Done is Better Than Good.” That said, the company released the new look to developers at its user conference in September, and with so many other savvy netizens pretending to be developers, Facebook already has over a million users with a new Timeline already.