Hadoop has recently gained lots of attentions from enterprises. Just think about the rapid growth of attendees in Hadoop Summit. There are many different ways to leverage Hadoop in enterprises. But in general, there are three major types of usage patterns as detailed below.
As a Framework
Bothered by SLOW Web UI to manage vSphere? Want to manage ALL your VMware vCenters, AWS, Azure, Openstack, container behind a SINGLE pane of glass? Want to search, analyze, report, visualize VMs, hosts, networks, datastores, events as easily as Google the Web? Find out more about vSearch 3.0: the search engine for all your private and public clouds.
This is what Hadoop was initially intended to be, and continues to be one of the major approaches in the short term. It means that an enterprise needs to invest in customized application development, which normally costs more than out of shelf applications.
In the long term, I expect it will slowly moves to the next two approaches. But it will continue to maintain certain level because you simply cannot buy any applications from the market. Also, you want to control an application if it’s your core competence against your competitors.
As a Platform
When an enterprise can buy a Hadoop application from app store, it can run it on its Hadoop clusters. There will be certain configurations but no software development involved.
To get there, there got to be certain standardization on the data format including input and output data, and stable Hadoop interfaces. Without these pre-requisites, it’s hard to run this way.
As an Application
For certain big data applications, it’s highly possible to embed Hadoop in the application. Consider a Web application that includes Tomcat. You don’t even notice the existence of Hadoop and every detail is hidden there.
This approach offers the best encapsulation and simplicity, but may not as efficient as the second approach where applications can share same clusters. It justifies when, for example, the underlying cluster is pretty small and the cost of dedicated cluster is relatively small, or the application is very demanding that it uses pretty all the resources of a cluster.