This past week was a busy one for Hadoop community with two Hadoop events in Silicon Valley. The first one was “what role will hadoop play in the enterprise” by Churchill Club which attracted about 300 attendees in a Palo Alto hotel. The second one was the much bigger conference Hadoop Summit in San Jose Convention Center. I will write a separate article on the second event soon.
The Churchil Club event was really a panel discussion by thought leaders: Michael Driscoll, CEO, Metamarkets; Andrew Mendelsohn, SVP, Oracle Server Technologies; Mike Olson, CEO, Cloudera; Jay Parikh, VP Infrastructure Engineering, Facebook; John Schroeder, CEO, MapR. It’s moderated by Cade Metz who is an editor at Wired Enterprise.
Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.
As the event name suggested, the discussions focused on Hadoop in enterprises. It’s known that Hadoop has been successfully implemented in big websites, from which Hadoop was initially designed for. Most enterprises have different environments, different requirements, and different people with different mindsets and skills. Therefore, what works for Web 2.0 companies may not work for enterprises, at least not as it is. There got to be some changes to make it work.
Given their backgrounds and whom they represent (important too!), the panelists shared many common thoughts as well different opinions on various issues. In the following, I just try to summarize some of the topics and points from the discussion. It’s wholly based on my memory, therefre could be wrong.
- Complimentary or disruptive. Hadoop is mostly a complimentary technology that addresses technical challenges that are impossible with traditional data warehousing. With the open source model and price advantage, it may get into traditional business intelligence market with some of these BI applications re-written with Hadoop.
- Data security is still a big concern for Hadoop going to enterprises. Michael made a good analogy from shipping industry that took long time for customers to trust companies to move their goods. Data is like goods, therefore will take long time to gain trust. Maybe it would take a generation.
- Enterprises will have the same volume of data as what big web sites have today. Therefore enterprises are better prepared today than otherwise.
- 100% open source or not. When asked about Cloudera’s keep management part private, Mike Olsen made an interesting comment. He basically said Redhat, with $2B annual revenue, is one of its kind in leveraging open source as business model, and he doesn’t see any others as lucky to be even close. Based on his experiences with other open source businesses, He thinks it’s a good model to keep core open sourced and enterprise features like management private. Moderator also threw the question to John whose company MapR had a proprietary file system which is written in C/C++ and outperforms several times over the typical Java based implementation. John joked that it’s not the only thing they do, and that they are not a file system company, but work on many other things with Hadoop. Cloudera doesn’t agree on the necessity of proprietary file system.
- Hardware acceleration. So far it’s not really needed, at least not a priority because there are many other low hanging fruits on software side.
- Commodity hardware or packaged premium appliance. There were discussions on open compute project started by Facebook for standardizing commodity hardware. It’s not really Hadoop question, but since most Hadoop runs on commodity hardware, it’s also discussed a bit. Oracle talked about its premium appliance with InfiniteBand networking.
- Ecosystem is the key to the success. The awareness of Hadoop is already there in enterprises. Just ask a typical CIO who may tell you, “I don’t know about Hadoop, but I think we need it.” For massive adoption, however, there got to be a rich portfolio of applications, tools. Unlike big web sites, typical enterprises do not have deep technical expertise on Hadoop, and it doesn’t make sense to invest as much as web companies.