VMware Serengeti: A Perfect Match of Hadoop and vSphere
During the Hadoop Summit 2012 last month, I learned the release of the open source (Apache license) Serengeti project from VMware. The week after, I downloaded the OVA file from VMware site, and gave my first try with a development environment after browsing through the user guide which introduces a fairly easy process to get a Hadoop cluster to run on vSphere.
It wasn’t successful the first time, and then I was distracted by other things until last weekend when I finally had time for another try. To my nice surprise, it worked pretty smoothly the second time. I could create a default Hadoop cluster with 3 worker virtual machine, and then add a new worker into the existing cluster.
I don’t know what magic worked the second time, but my guess is that the first environment I tried does not have DHCP server while the second one does. Per the user guide and appliance installation wizard, the DHCP may not be required, but it definitely smoothes out the experience. At least you don’t need to type commands to specify the IP addresses for Serengeti to allocate to different virtual machines it creates.
The user guide is pretty easy to read and follow with many screenshots and samples. It also highlights various important information and hints in callouts. It deserves good credits there. As always, you cannot trust a user guide 100% and quite often you need to do things slightly different.
Virtual Appliance Installation
The following is a screenshot I captured during installing the OVA to vSphere 5. Can you spot the problem there? The error message is very clear, and event better it points out how to solve the problem according to which I quickly fixed it later. After the problem, the installation went through smoothly.
The problem is really about timing. I wish I had been stopped in step one, or at least warned in document, so that I didn’t have to retry the wizard again. Anyway it’s not a big deal because there weren’t too many steps and I could mostly accept defaults. Better than that is to provide a quick button prompt a dialog box for me to change it on the fly. Am I expecting too much?
To be fair, it’s not an issue with Serengeti project but more related to the vSphere product itself.
Updating VMware Tools
Because I selected DHCP in the wizard, I tried to check out the IP address of the Serengeti server but failed because the VMware Tools “is Running (Out-of-date).”
You can choose to upgrade the VMware Tools. It’s fairly easy to do – just right click on the Serengeti virtual machine in vSphere Client, and you will find the context manual to do it for you. No complaints here because VMware wants lowest version of VMware Tools so that it can be used a wide range of vSphere. For one thing, you can always upgrade but not downgrade VMware Tools.
If you don’t upgrade, it’s still OK. You can still use the console tab of the virtual machine and logged in from there. Either way works fine.
Deploying Hadoop Cluster
Following the instruction, I entered the Serengeti and typed in the command to create a new cluster:
serengeti> cluster create –name myHadoop
It worked pretty well except it takes a little while creating several new virtual machines at the same time. Here is a screenshot of Serengeti server after the cluster is successfully deployed.