As its name suggests, the Hadoop MapReduce include Map and Reduce in its processing data flow. At its highest level, the MapReduce follows the traditional wisdom “Divide and Conquer” – dividing big data to small data that can be processed by a commodity computer and then pulling the results together.
In my previous article, I talked about why Web is not a good choice as the primary GUI for vSphere. I also mentioned that I was working on Ua small app to enhance the user experience of vSphere Web Client.
Today I am happy to announce a small application I developed recently using latest Visual Studio 2012 Express which is free from Microsoft. Although known with my work on Java in the community, I am pretty open to any programming languages and tools that are best to get work done. This time it happens to be C# and .NET.
I recently started to use the new Flex based vSphere Web Client while working on the open source vijava to support vSphere 5.1. Overall I like the look and feel, and particularly the extensibility story around the new architecture. However, I am not impressed by the performance – I saw way more “loading…” and clock cursor than I expected. Technically, I don’t think that is the direction VMware wants to bet on as the primary user interface for its flagship product vSphere.
After VMware touting out the new term “software defined data center,” I suddenly saw many vendors claiming they support software defined data center at VMworld. Days ago I read a news about Joe Tucci, the CEO of VMware’s parent company EMC, explaining what “software defined data center” is.
As mentioned in my previous post on Hadoop File System commands, the commands are built on top of the HDFS APIs. These APIs are defined in the org.apache.hadoop.fs package, including several interfaces and over 20 classes, enums, and exceptions (the number of interfaces and classes varied from release to release).
As always, it’s best to start with a sample code while learning new APIs. The following sample copies a file from local file system to HDFS.
If you’ve been following my blog, you may remember I wrote Cisco Nexus 1000V in VMware vSphere API about half year ago. The Cisco Nexus 1000V actually has another APIs based on XML. Interestingly, it’s implemented over SSH, but not HTTP or HTTPS.
The Nexus 1000V APIs follows two ITEF standards: RFC 4741 NETCONF Configuration Protocol, and RFC 4742 Using the NETCONF Configuration Protocol over Secure SHell (SSH). The first one is pretty long with close to 100 pages, but fortunately Wikipedia has a much shorter introduction. The RFC 4742 is just 8 pages and pretty easy to browse through.
I just took a Hadoop developer training in the week of September 10. To me, Hadoop is not totally new as I’ve tried HelloWorld sample and Serengeti project. Still, I found it’s nice to get away from daily job and go through a series of lectures and hands-on labs in a training setting. Believe it or not, I felt more tired after training than a typical working day. This post is not much new but just helps me on the commands when needed later.
After VMware released the vSphere 5.1 on the night of September 10, I finally got a chance to look at the new vSphere API, including the API reference and more important to me the WSDL files.
I was relieved to find out that there weren’t many changes. No single managed object is added to the vSphere 5.1 API, meaning a lot less work than I thought for vijava API to support the latest vSphere 5.1.
At first sight, these two technologies are totally different and you won’t talk about them together. But looking closely at the philosophies behind them, I find they are surprisingly similar and I hope you would agree with me after reading through this article.
A Quick Overview
Before getting into the detailed analysis, let’s take a quick look at the concepts and histories of both technologies.
The vRAM was the license model VMware used in vSphere 5.0. It basically limits the usage of virtual memory, which is different from physical memory, per license. When first announced last year, it created a lot of angry customers overnight even though VMware estimated that the license scheme wouldn’t affect most of the existing customers. Later on, VMware doubled the amount of virtual memory and implemented a cap per license, and insisted to roll out the modified license model despite strong objections from customers.
During the breaks of my vacation last week, I tried the Technology Preview for the Apache Hadoop-based Service on Windows Azure. The service is not yet publicly available and requiring Microsoft approval. Here is the link that I used to file my application. It took several days for me to get the email with invitation code. Sorry that I cannot include the code here.
About two weeks ago, CRN published an article about VMware Zephyr project. According to the article, VMware plans to launch a public IaaS cloud to compete with Amazon EC2, Microsoft Azure, and more directly with existing VMware vCloud service providers. The reason for the move is “because none of its service provider partners are moving fast enough. Look at the adoption rate of vCloud Director with service providers — it is non-existent.”
I came across a video on Youtube over the past weekend: Big Ideas: How Big is Big Data. Although coming with several mentions of EMC, it’s very well prepared and demonstrated with white-boarding, therefore worthwhile to share here.
Some of the key points made from the video include:
- The growth is accelerating. By 2020, there will be 50x more data than today.
In my last article, I analyzed the real motivation behind the VMware’s recent intention to acquire Nicira. In this article, I am going to review VMware’s past strategies and predict its long term strategies. In short, VMware’s past growth strategy is “vertical,” and its future growth strategy should be “horizontal.”
Past Strategy Review
VMware’s acquisition of Nicira posted a big risk on Cisco’s future control of networking market. The risk was in fact there from day one of VMware ESX with virtual switches and then distributed virtual switches, which reduces the need for customers to buy physical geeks from Cisco because virtual machines use “free” virtual ports. For the inter-physical server communication, customers still need Cisco and other vendors even though the volume is not as high as otherwise. That is why Cisco quickly came up with its own distributed virtual switch Nexus 1000v to stay relevant in the virtualization market.
On this past Monday VMware announced to buy Nicira for $1.26 billion. Congratulations to many of my former VMware colleagues who joined Nicira and will return back to VMware soon.
Overall this deal aligns well with VMware’s newly found vision on software defined data center. You must have read many of similar explanations and comments from various sources including this one from VMware CTO Steve Herrod, and this one by Nicira cofounder and CTO Martin Casado.
In my previous article, I talked about three different ways enterprises use Hadoop. Thinking a bit more, you may have come to realize that the three usage patterns are very similar to how we use Tomcat. I will compare these two for commonalities and differences.
First of all, both Hadoop and Tomcat are Java based open source projects from Apache Foundation, thus copyrighted by the same Apache license. As a result, you can freely use Hadoop in the same way as you have used Tomcat in terms of license compliance.
BusinessWeek recently published an article “In Silicon Valley, Hardware is Hot Again.” Almost all big names started to sell hardware now, Microsoft, Google, and Apple of course. Apple’s stellar success in iPhone and iPad disrupted the conventional wisdom that software is higher in margin compared to hardware. Also, Apple’s hardware and software combined devices posts a real risk for Microsoft and Google. To be exact, the hardware in the article title should really be software bundled hardware. That is why Google and Microsoft had to get into hardware business competing directly against Apple.
During the Hadoop Summit 2012 last month, I learned the release of the open source (Apache license) Serengeti project from VMware. The week after, I downloaded the OVA file from VMware site, and gave my first try with a development environment after browsing through the user guide which introduces a fairly easy process to get a Hadoop cluster to run on vSphere.
As a long time Eclipse user, I like its workspace concept and the ease of switching workspaces among many other things. The workspace provides a simple yet powerful way to isolate groups of projects into different workspaces under different folders, so you’re not distracted by other un-related projects.
This feature is, however, not available in Netbeans IDE, which is not a big deal most of time. By default, the Netbeans IDE creates a folder under current user’s home directory as follows (yours could be different):