The cloud has a big problem on its hands: Cloud storage is failure-prone, slow, and infinitely quirky. Anyone whose platform has been taken offline by one of Amazon’s periodic elastic block storage (EBS) outages can vouch for the fact that reliably storing data in the cloud is a very hard problem, and one that we’re only just beginning to solve.
Recently, solid-state disks (SSDs) have arisen as an answer to the performance part of the cloud storage challenge, but there are deeper problems with scalability and synchronization that merely moving the same databases from hard disk to SSD doesn’t necessarily solve. That’s why a group at Stanford has a radical suggestion: Datacenters should just put everything in RAM.
The proposed system, which the researchers are calling RAMcloud, would be a one-size-fits-all solution to the cloud storage problem that would replace the mosaic of SQL, NoSQL, object, and block storage solutions that cloud providers currently use to address different storage niches. And if it achieves the latter goal, then RAMcloud could be the badly needed breakthrough that does for cloud databases what Microsoft Access and Visual Basic did for relational databases by bringing the technology firmly within the grasp of ordinary programmers and analysts.
The datacenter as a giant RAM disk
At first glance, the idea of moving entire datacenters’ worth of storage to what is essentially a giant RAM disk might seem totally infeasible from a cost perspective. After all DRAM is far more expensive in terms of cost per bit than magnetic storage, so how could a datacenter possibly afford to ditch disks for RAM? It turns out that at cloud scale, the cost per bit issue begins to move in DRAM’s favor.
The Stanford team points out that between its memcached implementation and the RAM that’s on its actual database servers, Facebook was already storing 75% of its data in RAM as of August 2009. All of this data ends up cached in RAM—either explicitly via memcached, which is a RAM-based key-value store that can greatly speed up database access times, or implicitly via system memory—because hard disks are just too slow and too large. So when you load a page from Facebook, the vast majority of data from that page—if not all it—is already being fetched from RAM. Therefore what the authors are proposing isn’t a giant leap—it’s more like the final, incremental step to an all-RAM storage solution.
The paper also cites Jim Gray’s famous five-minute rule for trading disk accesses for memory, pointing out that, “with today’s technologies, if a 1KB record is accessed at least once every 30 hours, it is not only faster to store it in memory than on disk, but also cheaper (to enable this access rate only 2 percent of the disk space can be utilized).” So as disk densities go up, RAM actually gets cheaper for random accesses. (The situation is different for sequential accesses; see this classic interview with Gray on the topic of disk densities and random access frequency.)
The upshot of all of this is that moving everything into RAM would not only be faster than disk, but it would also be cheaper under certain common circumstances.
The de-ninjafication of cloud storage
The benefits of RAMcloud would go beyond mere speed and cost, though. There’s a growing consensus in the cloud storage community that there will ultimately be no “one-size-fits-all” solution for cloud storage, and that the current patchwork of different storage solutions that fall into different quadrants of the scalability vs. latency vs. consistency vs. complexity map represents the new normal. No one storage technology, the thinking goes, can possibly be all things to all customers, the way that the relational database system was in a previous era.
This heterogeneity would probably be fine for cloud storage, if it weren’t for the CIO confusion and loss of programmer productivity that the complexity engenders. By now, everyone sort of knows what Hadoop is (or if they don’t, then they aren’t admitting it), but Hadoop is just one member of a large and growing number of ways to store, retrieve, and transform bits in the cloud. It takes time and deep expertise to acquire and maintain a thorough grasp of each one of the myriad cloud storage options, and technical people who can do this are in short supply.
(Related to this point, I’ve talked in various places about the need for the cloud in general to become “de-ninjafied”, because as it currently stands the complexity associated with getting maximum productivity out of many PaaS and IaaS platforms is so great that few coders have the necessary skills to do so. See my recent piece on the cloud talent crunch, and this followup exchange with Felix Salmon. In the latter, I elaborate on my point about the need to bring cloud programming down to a level where more casual programers can take full advantage of the full range of storage and compute resources that cloud offers.)
The folks behind RAMcloud think that they can to cut the Gordian knot of storage solutions in one mighty stroke, and offer a single storage abstraction that satisfies everyone’s needs for consistency, scalability, and speed while being easy-to-use at the same time. The key is lies in DRAM’s greatest strength versus every other form of storage, including flash: lightning-fast latency.
Latency, consistency, NoSQL
The RAMcloud team hopes to get access latencies all the way down to 5-10 microseconds (compare around 200 milliseconds with current technologies). That’s because the that lower latencies go, the less time each individual read or write instruction spends in the system. And as instructions spend less time in flight, the critical window during which two accesses to the same byte could accidentally overwrite each other shrinks, so it gets easier to maintain data consistency across ever larger storage pools.
For example, read-after-write is a common occurrence in any storage pool, whether it’s a hard disk or a database. In this scenario, a write to a particular file or record is followed immediately by a read from that same file or record. For a database to guarantee its users that every read will return only the most up-to-date data, it must first confirm for every read that there are no prior, pending writes that are working their way through the system from somewhere else to modify to the target record; and if there are such pending writes, then the system must wait for them to complete before returning the results of the read command. So the longer it takes for writes to work their way through the system, the longer it takes for reads to return up-to-date data.
In order to get around this problem, many NoSQL storage offerings simply dispense with the guarantee that reads will return up-to-date records. Because the vast majority of reads are not read-after-write, a database without this guarantee will perform identically to a database with the guarantee over 90 percent of the time, only much, much faster. But every now and then, the NoSQL database will return out-of-date data to a user, because it’s reading from an address that has yet to be updated by a prior inbound write that is coming in from a node that’s further away.
For a platform like Facebook, there are many places where stale data is just fine, hence the widespread use of NoSQL there. If I live three miles away from the Facebook datacenter that houses my friend’s profile, and my friend lives 1,000 miles away, then neither of us really cares if my browser’s copy of his profile is out-of-date because he clicked “update” some 600ms before I clicked his profile link, and my read beat his write to the datacenter. Facebook would rather give me much faster profile loads in exchange for a slightly stale bit of data every now and then.
Many business-oriented database applications, however, could never tolerate this kind of read-after-write sloppiness, even though they would desperately like to get their read latencies as low as possible. For instance, high-speed finance is the perfect example of a market for cloud storage that will pay top dollar for maximum performance but has low tolerance for inaccuracies arising from the kinds of consistency issues described above. But traditional relational databases just can’t scale the way that many of these customers would like.
Relational databases scale poorly because the length of time it takes for writes travel through the system to get to their target grows as the size of the storage pool grows to encompass data stored on multiple networked machines, which means that for a classic relational database the read latency also grows as the system grows. So a relational database’s overall performance rapidly deteriorates as it gets scales outward across multiple systems.
In the current storage context, where users are forced to choose between scalability and consistency, more and more of them are choosing scalability. But safely using these inconsistent databases in critical business applications takes a ton of programmer effort and expertise. If RAMcloud is successful in offering both consistency and scalability, then a number of users can ditch NoSQL and go back to a traditional, easier-to-user RDBMS.
Ultimately, though, what RAMcloud offers isn’t an either-or proposition. Rather, the promise is that RAMcloud’s low latency will let a database with whatever level of consistency—from a fully ACID-compliant RDBMS to a NoSQL offering with fewer consistency guarantees—scale out much further horizontally than would otherwise be possible. This will bring the ACID guarantees back within the reach of some database users whose scalability needs had grown past the point where they could use an RDBMS; these shops can cut significant cost and complexity out of the IT-facing side of their data storage solution by just going back to good old relational databases on RAMcloud.
The major implementation challenges to the RAMcloud idea are obvious to anyone who has ever lost some work to an unexpected power outage or reboot. Because DRAM is volatile, the RAMcloud will have to use some combination of hard disk writes and node-to-node replication to achieve consistency. The problem with the former is that it can easily put you right back into the read-after-write latency trap, while the latter solution will massively boost RAMcloud’s cost per bit (i.e. if you have to copy every byte of data to RAM on three other nodes for redundancy’s sake, then it takes three times the amount of RAM to store each byte).
Then there’s the problem of scaling this solution across datacenters. Even if RAMcloud can get its internal latencies down into the microsecond regime using commodity hardware alone, it will be very hard to retain the solution’s latency-related advantages when scaling across multiple, geographically disparate datacenters.
So the challenges facing RAMcloud are huge, but so is the potential upside. If RAMcloud can put database consistency back on the table for storage clusters with hundreds and thousands of nodes, then it could go a long way toward simplifying storage to the point that nonspecialists can build productive database solutions on top of the cloud the way they once did on top of Microsoft Access.