What Hadoop Community Can Learn From VMware Virtualization
As I mentioned in a previous article, Hadoop is in a similar stage as virtualization 10 years ago – the technology is mostly ready for wider adoption. There were certain secret sauces leading to virtualization’s stellar success, especially VMware in the enterprise space. Here I examine some of these success factors that could be learned by Hadoop community.
Strive For Out Of Box Experience
No one wants a long and tedious installation and configuration process no matter how powerful a product is. This is particularly true for enterprises which want to use Hadoop as a tool. Anything should be hidden just hide it even though it could be interesting for some geeks.
VMware has done a great job on user experience in its Virtual Infrastructure which was later rebranded as vSphere. Not only the installation and configuration is simple, but also the migration of physical to virtual is mostly painless with tools like vCenter Converter.
Build Strong Ecosystem Of Applications
When the ESX was first introduced, the challenge was the drivers that hook the hypervisor to the underlying hardware. At that time, the existing drivers didn’t work as they were for ESX, but VMware had successfully convinced hardware vendors to provide drivers, which turned out to be essential for wide adoption of ESX on various hardware platforms and its dominance of virtualization market.
With Hadoop, the problem is on the other side of the stack – applications. You can’t do much unless you have an application built already. It’s generally expected for enterprises themselves to build customized applications, which is definitely doable but not ideal because of investments and expertise involved. Ideally, there should be packaged Hadoop applications that enterprises can buy and run without development.
A Hadoop app store would be valuable to facilitate this model for enterprises. Typical adoption of a new technology in enterprise is mostly driven by real applications not its potential.
For more techniques on building ecosystem, check out my article on CO2 formula.
I think everyone loves free software like open source. But when an enterprise decides to depend on Hadoop for its mission critical business, it always needs commercial support. Most of the time, it may not need anything but for a peace of mind like insurance. So this has to be addressed for the adoption of Hadoop into production system.
If there is anything that prevents Hadoop from growing big in the future, I think it’s probably too many distributions that confuse the community. This is a typical problem of an open source project. Imagine the Linux community which has lots of distributions. Nothing is substantially different, but good enough to cause confusions among users who need to think hard, “what are the differences?” and, “which one should I use?” As a result, it prevents itself to grow bigger in enterprises. As I can see, it’s not an easy problem to solve given the nature of Apache license.