VMware Serengeti: A Perfect Match of Hadoop and vSphere

During the Hadoop Summit 2012 last month, I learned the release of the open source (Apache license) Serengeti project from VMware. The week after, I downloaded the OVA file from VMware site, and gave my first try with a development environment after browsing through the user guide which introduces a fairly easy process to get a Hadoop cluster to run on vSphere.

It wasn’t successful the first time, and then I was distracted by other things until last weekend when I finally had time for another try. To my nice surprise, it worked pretty smoothly the second time. I could create a default Hadoop cluster with 3 worker virtual machine, and then add a new worker into the existing cluster.

Bothered by SLOW Web UI to manage vSphere? Want to manage ALL your VMware vCenters, AWS, Azure, Openstack, container behind a SINGLE pane of glass? Want to search, analyze, report, visualize VMs, hosts, networks, datastores, events as easily as Google the Web? Find out more about vSearch 3.0: the search engine for all your private and public clouds.

I don’t know what magic worked the second time, but my guess is that the first environment I tried does not have DHCP server while the second one does. Per the user guide and appliance installation wizard, the DHCP may not be required, but it definitely smoothes out the experience. At least you don’t need to type commands to specify the IP addresses for Serengeti to allocate to different virtual machines it creates.

The user guide is pretty easy to read and follow with many screenshots and samples. It also highlights various important information and hints in callouts. It deserves good credits there. As always, you cannot trust a user guide 100% and quite often you need to do things slightly different.

Virtual Appliance Installation

The following is a screenshot I captured during installing the OVA to vSphere 5. Can you spot the problem there? The error message is very clear, and event better it points out how to solve the problem according to which I quickly fixed it later. After the problem, the installation went through smoothly.

 

The problem is really about timing. I wish I had been stopped in step one, or at least warned in document, so that I didn’t have to retry the wizard again. Anyway it’s not a big deal because there weren’t too many steps and I could mostly accept defaults.  Better than that is to provide a quick button prompt a dialog box for me to change it on the fly. Am I expecting too much? :-)

To be fair, it’s not an issue with Serengeti project but more related to the vSphere product itself.

Updating VMware Tools

Because I selected DHCP in the wizard, I tried to check out the IP address of the Serengeti server but failed because the VMware Tools “is Running (Out-of-date).”

You can choose to upgrade the VMware Tools. It’s fairly easy to do – just right click on the Serengeti virtual machine in vSphere Client, and you will find the context manual to do it for you. No complaints here because VMware wants lowest version of VMware Tools so that it can be used a wide range of vSphere. For one thing, you can always upgrade but not downgrade VMware Tools.

If you don’t upgrade, it’s still OK. You can still use the console tab of the virtual machine and logged in from there. Either way works fine.

Deploying Hadoop Cluster

Following the instruction, I entered the Serengeti and typed in the command to create a new cluster:

serengeti> cluster create –name myHadoop

It worked pretty well except it takes a little while creating several new virtual machines at the same time. Here is a screenshot of Serengeti server after the cluster is successfully deployed.

This entry was posted in Big Data, Virtualization and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Trackback

  • By Hadoop File System Commands | DoubleCloud.org on October 2, 2012 at 6:28 pm

    […] in the week of September 10. To me, Hadoop is not totally new as I’ve tried HelloWorld sample and Serengeti project. Still, I found it’s nice to get away from daily job and go through a series of lectures and […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • NEED HELP?


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__ doublecloud.org.

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.