My First Try of Hadoop Azure
During the breaks of my vacation last week, I tried the Technology Preview for the Apache Hadoop-based Service on Windows Azure. The service is not yet publicly available and requiring Microsoft approval. Here is the link that I used to file my application. It took several days for me to get the email with invitation code. Sorry that I cannot include the code here.
As with most other typical Microsoft products, the sign-in process is pretty smooth. Not much surprise there. After signing in, the first thing is to create a new Hadoop cluster. For that, you have to pick a DNS name for the management virtual machine for the new cluster. The suffix of the DNS name is always cloudapp.net. As you can image, I picked http://doublecloud.cloudapp.net which was still available.
The cluster size defaults to 2 nodes and 1 TB disk space, and cannot be changed. For evaluation purpose, it’s pretty good not to mention that it’s free for 5 days. I assume you can change cluster size in real services provided that you also supply your credit card info.
Below the cluster size, you enter the username and password (somehow, the password rule is a little strange in that you cannot include any symbols which are frequently required). You need this credential to RDP to the management virtual machine later on. Optionally you can select SQL Azure. I didn’t choose that because I wanted to keep it simple. It took several, probably more than 10, minutes to create a new cluster.
When the cluster is created, the GUI looks like the following. Note that after doing private beta under NDA with VMware for years, I wondered whether same restriction would apply especially it’s invitation only preview. Then I found an article on Technet with more shared, and figured I should be fine.
The GUI is the new Metro style consistent with that of Windows 8. Using that is quite simple and straight forward – just click on one of the blocks. The primary one is the blue one: Create Job. After a new Hadoop job is created, a new block comes after. If you don’t have Hadoop application developed, you can just deploy any of the 9 samples as I did.
The management virtual machine is a windows based machine with unique IP address accessible from outside. After RDPing to it, you can also run Hadoop commands, and do things with a typical windows machine except that the administrative privilege is restricted.
After playing with the cluster and management VM, I’ve released the cluster. The overall experience is pretty good with a nice balance of simplicity and features. As I tweeted about it, I got follow-ups from Matt Winkler who is the program manager for Hadoop Azure at Microsoft. Please feel free to ping him on Twitter.