Just four months ago, Oracle released a very official-looking corporate white paper intent on “debunking the hype” surrounding the NoSQL movement – a widespread effort to build a new breed of database that can juggle vast amounts of “unstructured” information in ways a traditional Oracle database can’t.
“The NoSQL databases are beginning to feel like an ice cream store that entices you with a new flavor of the month,” the white paper read. “[But] you shouldn’t get too attached to any of the flavors because it may not be around for too long.”
Oracle’s extended diatribe against the NoSQL crowd – including Cassandra, MongoDB, CouchDB, and Redis – sought to expose their limitations and sow some serious doubt over their open source roots. But the white paper has now vanished from Oracle’s website, surviving only through Google’s search cache, and Oracle has launched a new attack on the NoSQL movement. On Monday, at its massive Oracle OpenWorld conference in downtown San Francisco, Oracle unveiled its own NoSQL database.
Last week, a few words sprinkled onto the OpenWorld website indicated that such a database was on the way, and with his Monday morning keynote, Oracle executive vice-president of product development Thomas Kurian officially acknowledged the un-kept secret, announcing that the Oracle NoSQL Database will be included with a new hardware system known as the Oracle Big Data Appliance. Big Data is the moniker du jour for the epic amounts of unstructured web data facing many of today’s businesses, and with the new appliance, Oracle is embracing not only NoSQL, but Hadoop, the other open source movement so often associated with the term.
The new appliance shows that while Oracle has no intention of undermining its existing database business, it’s willing to change with the times. NoSQL and Hadoop – the open source data-crunching platform based on Google’s back-end infrastructure – arose as alternatives to Oracle’s existing database and analytics products, and now, with a single hardware product, Oracle has rolled both into a single product.
More than one database to rule them all
For Max Schireson – the president of 10gen, the company behind the open source NoSQL database MongoDB – Oracle’s march into his territory is no surprise. “Ten or fifteen years ago, Oracle was very much of there-should-be-only-one-database mindset, but this has long since fallen by the wayside,” he told Wired, pointing to Oracle’s purchase of alternative databases such as TimesTen, BerkeleyDB, and MySQL. “It makes all the sense in the world for them to have a NoSQL too.”
A traditional relational database stores data in neat rows and columns, and it’s designed to run on a single machine – though in recent years engineers have learned to stretch such databases across multiple servers with varying success. By contrast, a NoSQL database provides a more flexible data model, and it’s specifically built to scale across a vast number of machines.
“With NoSQL, you get greater agility and scalability. That’s what’s attracting so many people to the space,” says Schireson. “All the products are designed for horizontal scalability and all of them have alternative data models for rapidly changing data and heterogeneous data.”
The rub is that you can’t always slice and dice the data as easily as you can with with a relational database. Generally, the “transactional semantics” of a NoSQL database are somewhat limited, and you can’t do database “joins”, where you merge data from two or more database tables.
NoSQL is a broad term, and by some counts, over 120 different outfits offer a database along these lines. Some, such as MongoDB, store data as “objects” – essentially documents – of varying sizes, while others, such as the open source Cassandra database, developed at Facebook, store data as “key-value” pairs – i.e. “color” and “red” or “name” and “Bob.” But most of these databases are open source, and all are designed to run across a large number of low-cost machines.
During Monday’s keynote, Kurian said that the Oracle NoSQL database would use a key-value store, but that was extent of his description. “If you have a large data set that you’re processing – that’s, for example, web logs off a high-performance web form – you can take those web forms and store them in the Oracle NoSQL database as key-value pairs,” was his one sentence description.
At a press event later in the day, Andy Mendelson, senior vice president for Oracle database server technologies, said the company’s new NoSQL platform is based on open source BerkeleyDB database.
In addition to this NoSQL database, the Oracle Big Data appliance will also include the Apache open source distribution of Hadoop and various Oracle-designed tools meant for use with the platform, including a “Loader for Hadoop” that moves Hadoop data into Oracle’s standard database warehouse.
Essentially, Hadoop is a means of processing large amounts of data across clusters of low-cost servers. Based on Google’s GFS distributed file system and MapReduce distributed number-cruncher, the platform “maps” tasks across machines, splitting them into tiny sub-tasks, before “reducing” the results into a master calculation. It provides analytics for the sort of data you shuttle into a NoSQL database.
Predictably, 10gen’s Max Schireson paints Oracle’s move into the NoSQL realm as a good thing for the existing market. “It will add one more competitor to the fray, but I think [Oracle’s] presence will expand the market,” he said. “I don’t expect they’ll wind up dominating the space.”
Yes, Oracle has existing relationships with beaucoup businesses. And yes, its sales staff dwarfs anything you’ll find at a NoSQL startup. But Schireson believes that, yes, the open source nature of databases like MongoDB can compete in other ways, pointing out that over 100,000 developers download the open source Mongo code each month. “They just find it on the Internet, and they start doing something with it. It’s not through an interaction with our sales team that most people become acquainted with the technology.
“Even thousands of Oracle sales people aren’t going to generate hundreds of thousands of developers trying the technology. That comes through word of mouth, developers talking to other developers about what they use. If Oracle’s technology is strong and they get a following, then they could have that type of usage, but they can’t just manufacture it.”
Oracle’s Mendelson told Wired that the Oracle NoSQL database will be open sourced, but that there will also be a closed source version that customers must payoff. Today, the company treats BerkeleyDB in much the same way.
Similarly, John Schroeder – co-founder and CEO of MapR, an outfit that has commercialized Hadoop – welcomes Oracle embrace of the open source number-crunching platform. “It’s just another indication of how important Hadoop has become as a Big Data analytics platform,” he told Wired, pointing out that EMC has also introduced a Hadoop appliance and that IBM is providing Hadoop-related services.
But other competitors question whether Oracle’s move really makes sense. Brad Peters – the CEO of Birst, an business analytics company that focuses on structured data stored in traditional relational database – believes that Oracle is simply tossing some existing software onto the server, and that the company will have trouble supporting it because much of the code is open source. He also argues that the market for NoSQL is tiny relative to the relational database, and he doesn’t understand why the company would shift valuable resources onto a product that few businesses will actually need.
EMC chief marketing officer Jeremy Burton characterizes Oracle’s Big Data Appliance as “pretty much” a direct competitor to EMC’s appliance, but he’s not sure how much Larry Ellison and company really want to sell the thing. “It could be more of a defensive product than than an offensive product,” he told Wired. “If customers really want NoSQL and Hadoop, they’ll have something to offer. But they definitely don’t want to cannibalize their current business.”
All that said, the need for NoSQL and Hadoop will only grow as businesses struggle to cope with more and more data – a clear trend in today’s web-centric world. For Schroder, the move makes perfect sense. Oracle is giving many businesses something they have an obvious need for, and its putting a familiar name behind it. “With so many NoSQL products out there, it makes it difficult for organizations to make a bet on one,” he says. “Oracle [NoSQL] changes that.” Whether it really wants to sell the thing or not, the database giant is providing a conspicuous alternative to the NoSQL “flavor of the month”.