Like so many others, EMC’s Greenplum unit is dressing up Hadoop for a go at big business.
On Wednesday, EMC announced a data analytics platform that starts with a structured database, adds the Hadoop Big Data software, and wraps them in a social network. Known as the Greenplum Unified Analytics Platform, it handles both structured and unstructured data, incorporating the company’s SQL database and Hadoop implementation and allowing data to flow from one to the other. Organizations can use data stored in Hadoop from the Greenplum database much more easily now, said Luke Lonergan, Greenplum CTO, Vice President and Co-Founder.
Named after the yellow stuffed elephant that belonged to the son of its founder, Hadoop is open-source software that dices huge amounts of data and spreads the pieces across thousands of processors. The software is widely used to analyze the massive clickstreams that flow through the likes of Facebook, Twitter, eBay, and Yahoo.
Hadoop has been a tool for big Internet companies, for the most part. But that’s changing in a hurry as enterprises increasingly deploy the software. Every major vendor that peddles databases is adopting the technology and proclaiming itself a player in the emerging Big Data market.
The idea is to make working with Big Data as similar to working with structured data as possible. The platform is the beginning of a 12-to 18-month process that will tie structured and unstructured data analysis more closely and allow a wide variety of tools to access the data, said Lonergan. “Our objective is to get to store once, use many,” he said.
EMC’s platform also includes a Facebook-ish social network dubbed Chorus. The network allows dispersed teams of data scientists and analysts to work together. It also allows data scientists to make their work public. “Searching for what others are doing is one aspect of being able to learn how to do these kinds of data science things,” said Lonergan. You can create data sets using other people’s work within Chorus, he said.
Chorus also allows business people to keep tabs on projects and have input in the process, said Lonergan. Data scientists usually go off in a corner and work by themselves for months at a time, he said. “What we’re doing is providing a social app for perhaps some of the most introverted people in the world: PhD statisticians,” he said.
Greenplum is boosting support for Hadoop in the platform’s administration module, the Greenplum Command Center. Hadoop administration still “requires a lot of manual work and high level of expertise from systems administrators,” said Dan Vesset, vice president of business analytics at market research firm IDC. “Unless you are one of the few Internet companies, this expertise is relatively hard to come by.”
Administration tools that help automate command and control of systems that include Hadoop are an IT productivity enhancement, said Vesset.
Greenplum’s Unified Analytics Platform is likely to be among the best in terms of integrating the analysis of structured and semistructured data, said Vesset. “I say semi-structured because the vast majority of use cases for Hadoop involve analysis of web log or clickstream data that has some structure.”
The big question is whether EMC will succeed in the data analytics market, regardless of the quality of technology, said Vesset. “Selling an analytics platform is not like selling storage solutions, and in this market EMC is competing against much bigger incumbents.”
HP, IBM, Microsoft, Oracle, SAP and Teradata are all vying for big pieces of the enterprise Big Data pie. EMC might have the sharpest knife, but will it be able to elbow its way to the table?