Dave Anderson looked into a desk drawer filled with tiny computers. Each was no bigger than a hardback novel, and their chips ran no faster than 600 MHz. Built by a little-known company called Soekris Engineering, they were meant to be wireless access points or network firewalls, and that’s how Anderson — a computer science professor at Carnegie Melon — used them in a previous research project. But that project was over, and he thought: “They’ve got to be good for something else.”
At first, he decided these tiny machines could be super-low-power DNS (domain name system) servers — servers that take site names and translate them to a numeric internet address — and he asked some Ph.D. students to make it happen. “I wondered,” he remembers, “if we could do this on a wimpy platform that consumed only about 5 watts of power rather than 500.” Those students proved they could. But they also told Anderson he was thinking too small.
After tinkering with his tiny machines, they realized that if you strung a bunch of them together, you could run a massive application each machine could never execute on its own. The trick was to split the application’s duties into tiny pieces and spread them evenly across the network. “They were right,” Anderson says of his students. “We could use these boxes to run high-performance large-scale key-value stores — the kind of [databases] you would run behind the scenes at Facebook or Twitter. And the rest is publication history.”
The year was 2008, and as it turns out, Anderson and his students were at the forefront of a movement that could reinvent the way the world uses its servers, making them significantly more efficient — and cramming them into much smaller spaces. Startups such as SeaMicro and Calxeda are now building servers using the hundreds of low-power processor cores originally designed for cell phones and other mobile devices. HP is set to resell Calxeda machines as it explores similar systems with a research effort called Project Moonshot. And the giants of the internet — including Google, Amazon, and Facebook — are seriously considering the possibility of running their operations atop the sort of “wimpy” processors Anderson found in his desk drawer.
“Wimpy” is the official term. Now into its fourth year, Anderson’s project is known as the Fast Array of Wimpy Nodes, or FAWN. He regrets the name. “No manufacturer wants to advertise their products as wimpy,” he says. But the name certainly suits his research, and despite the negative connotation, the project has attracted the interest of the largest chip maker on earth. Intel sponsors Anderson’s research, and he works closely with researchers at the Pittsburgh lab Intel runs on the Carnegie Melon campus.
The rub is that the Fast Array of Wimpy Nodes isn’t always fast. In some cases, software must be significantly rewritten to achieve high speeds on a collection of low-power processors, and other applications aren’t suited to the setup at all.
Like so many others across the server world, Intel is approaching the wimpy-node idea with skepticism — and not just because it makes an awful lot of money selling the far-from-wimpy processors that power today’s servers. “Intel is trying to walk a difficult line,” Anderson says. “Yes, a lot of their profit is from big brawny processors — and they don’t want to undercut that. But they also don’t want their customers to get inappropriately excited about wimpy processors and then be disappointed.”
Dave Anderson says that skepticism is healthy. But only up to a point. His research shows that many applications can be far more efficient on wimpy nodes, including not only ordinary web serving but, yes, large databases. “Intel realizes this too,” he says. “And they don’t want to get blindsided.”
Google Slaps Wimps
Google is a search and advertising company. But it’s also the company the world looks to for the latest thinking on hardware and software infrastructure. Google uses custom-built software platforms to distribute enormous applications across a worldwide network of custom-built servers, and this do-it-yourself approach to parallel computing has inspired everything from Hadoop, the increasingly popular open source platform for crunching data with vast server clusters, to Facebook’s Open Compute Project, a collective effort to improve the efficiency of the world’s servers.
So when Urs Hölzle, the man who oversees Google’s infrastructure, weighed in on the wimpy node idea, the server world sat up and noticed. If anyone believes in wimpy nodes, the world assumed, it’s Hölzle. But with a paper published in chip design magazine IEEE Micro, Google’s parallel computing guru actually took the hype down a notch. “Brawny cores still beat wimpy cores, most of the time,” read the paper’s title.
The problem, Hölzle said, was something called Amdahl’s law: If you parallelize only part of a system, there’s a limit to the performance improvement. “Slower but energy efficient ‘wimpy’ cores only win for general workloads if their single-core speed is reasonably close to that of mid-range ‘brawny’ cores,” he wrote. “In many corners of the real world, [wimpy core systems] are prohibited by law — Amdahl’s law.”
In short, he argued that moving information between so many cores can blog down the entire system. But he also complained that if you install a wimpy node array, you may have to rewrite your applications. “Cost numbers used by wimpy-core evangelists always exclude software development costs,” he said. “Unfortunately, wimpy-core systems can require applications to be explicitly parallelized or otherwise optimized for acceptable performance.”
Many “wimpy-core evangelists” took issue with Hölzle’s paper. But Dave Anderson calls it “reasonably balanced,” and he urges readers to consider the source. “I think you should also realize that this is written from the perspective of a company that doesn’t want to change too much of its software,” he says.
Anderson’s research has shown that some applications do require a significant rewrite, including virus scanning and other tasks that look for patterns in large amounts of data. “We actually locked our entire cluster because the [pattern recognition] algorithms we used allocated more memory than our individual cores had,” he remembers. “If you’re using wimpy cores, they probably don’t have as much memory per processor as the brawny cores. This can be a big limiter.”
But not all applications use as much memory. And in some cases, software can run on a wimpy core system with relatively few changes. Mozilla is using SeaMircro servers — based on Intel’s ATOM mobile processor — to facilitate downloads of its Firefox browser, saying the cluster draws about one fifth the power and uses about a fourth of the space of its previous cluster. Anderson points to this as an example of a wimpy core system that can be rolled out with relatively little effort.
Anderson’s stance echos that of Intel. This summer, when we asked Jason Waxman — the general manager of high-density computing in Intel’s data center group — about the company’s stance on wimpy nodes, he said that many applications — including those run by Google — are unsuited to the setup, but that others — including basic web serving — work just fine.
In other words, Google’s needs may not be your needs. Even if your applications are similar to Google’s, you may be more willing to rewrite your code. “I’m a researcher,” Anderson says. “I’m completely happy — and actually enjoy — reinventing the software. But there are others who would never ever want to rewrite their software. The question should be: As a company, where do you fit on that spectrum?”
Wimps Get Brawny
At the same time, wimpy nodes are evolving. Although low-power processors such as the Intel Atom and the ARM chips used by Calxeda can’t handle as much memory as “brawny” servers chips from Intel and AMD, newer versions are on the way — and these will shrink the memory gap. Facebook has said it can’t move to ARM chips because of the memory limitations, but it has also indicated it could move to wimpy cores once those limitations are resolved.
As the chips evolve, the rest of the system is evolving around them. Dave Anderson’s array uses flash storage rather than hard disks, and similar research from Steve Swanson — a professor of computer science and engineering at the University of San Diego — has shown wimpy nodes and flash go hand-in-hand. If you move to flash — the same solid-state storage used by smartphones — in place of spinning hard drives, you can use chips with lower clock speeds.
An old fashioned hard drive burns about 10 watts of power even when it’s doing nothing. In order to get the most out of the drive, you need a fast processor. But flash storage doesn’t burn that much power when idle, and that means you can use slower chips. “Adding solid state drives lets you use wimpier cores without giving up as much energy efficiency as you would if you were using a hard drive,” Swanson says. “With a hard drive, you want to use a faster core because it can access the hard drive and then race ahead as quickly as possible for the next access. With a solid state drive, it’s less critical that the processor race ahead to save power while the drive is idle.”
Anderson is also looking at ways to better balance workloads across wimpy node systems — an issue Urs Hölzle alludes to in his paper. “It is a problem,” he says, “but it’s a solvable problem. It just takes research and programmer effort to solve it.” What Hölzle identifies as difficulties, Anderson prefers to think of as research opportunities.
This includes software rewrites. In the short term, many companies — including Google — will frown on the idea. But in the long term, this changes. Since Hölzle published his paper, Google has resolved to rewrite its backend software — which is now being stretched into its second decade — and the new platform may very well move closer to the wimpy end of the spectrum.
Dave Anderson isn’t just looking at how wimpy core systems can be used today. He’s looking at how they can be used tomorrow. “If you came to me and you said: ‘Hey, Dave, how should I build my data center?’, I would not tell you to go and use the wimpiest cores you could find. That’s how I built mine, but I’m trying to push the limit and understand how to make these things practical.”