Author Archives: Steve Jin

Three Ways Enterprises Can Use Hadoop

Hadoop has recently gained lots of attentions from enterprises. Just think about the rapid growth of attendees in Hadoop Summit. There are many different ways to leverage Hadoop in enterprises. But in general, there are three major types of usage patterns as detailed below.

As a Framework

This is what Hadoop was initially intended to be, and continues to be one of the major approaches in the short term. It means that an enterprise needs to invest in customized application development, which normally costs more than out of shelf applications.

Posted in Big Data | Tagged , | 1 Response

What Hadoop Community Can Learn From VMware Virtualization

As I mentioned in a previous article, Hadoop is in a similar stage as virtualization 10 years ago – the technology is mostly ready for wider adoption. There were certain secret sauces leading to virtualization’s stellar success, especially VMware in the enterprise space. Here I examine some of these success factors that could be learned by Hadoop community.

Strive For Out Of Box Experience

Posted in Big Data | Tagged , | Leave a comment

Is MapReduce A Major Step Backwards?

While learning Hadoop, I was wondering whether the MapReduce processing model that can handle all the Big Data challenges. David DeWitt and Michael Stonebrakeer took a step further by arguing MapReduce is a major step backwards in their blog article. I found it’s a very good reading but not necessarily agree with the authors. It’s always good to know different opinions and the contexts where they come from. I also found the authors wrote the best introduction of MapReduce in several short paragraphs. I quote them in the end, so read on.

Posted in Big Data | Tagged , | 3 Responses

MapReduce: The Theory Behind Hadoop

As most of us know, Hadoop is a Java implementation of the MapReduce processing model originated from Google by Jeffrey Dean and Sanjay Ghemawat. After studying Hadoop and attending several related events(Hadoop Summit, Hadoop for Enterprise by Churchill Club), I felt I should dig deeper by reading the original paper.

The paper is titled “MapReduce: Simplified Data Processing on Large Clusters.” Unlike most research papers I’ve read before, it’s written in plain English and fairly easy to read and follow. I find it really worthwhile reading and strongly recommend you spend an hour to read through it.

Posted in Big Data | Tagged , | 2 Responses

Review Board Virtual Machine for Code Review: The Missing Manual

Code review is important for the quality of a software product. It used to be a meeting activity where a small group of engineers walk through changes and provide the author feedbacks. This is highly effective but not flexible enough, especially when there are frequent code changes.

Posted in Software Development | Tagged , , | 1 Response

GPU for Big Data Processing

While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.

Posted in Big Data | Tagged , , , | 5 Responses

Support Next vSphere Release in VI Java API: The Plan and Work Around

Recently I got several questions and even a bug on supporting the next release of vSphere in the open source VI Java API. The questions are mostly from VMware partners who have early access of the private beta of next release of vSphere and want to ship their own products at the same time of vSphere GA. I figure more partners may have the same question, therefore decide to answer it all here with a possible work around.

Posted in vSphere API | Tagged , | Leave a comment

GUI Front End for Hadoop

I went to LinkedIn last Wednesday for a tech talk by UC Berkeley professor Joseph Hellerstein on Programming for Distributed Consistency: CALM and Bloom. This is indeed a highly specialized topic, so I am not going to talk about the details. Should you be interested in the new programming language Bloom, you can check the web site (http://bloom-lang.org).

Posted in Big Data | Tagged , | 1 Response

Hadoop Summit 2012: A Quick Summary

After the Churchill event on Hadoop for enterprises, I attended the Hadoop Summit in San Jose convention center. It’s one of the benefits living in Silicon Valley that I can attend various tech events without flying away from family for days.

Posted in Big Data, News & Events | Tagged , | 7 Responses

Getting started with Hadoop: My First Try

Given the growing popularity of Hadoop, I decided to give it a try by myself. As normal, I searched for a tutorial first and got one by Yahoo, which is based on Hadoop 0.18.0 virtual machine. I knew the current stable version is 1.x, but that is OK because I just wanted to get a big picture and I didn’t want to refuse the convenience of ready-to-use Hadoop virtual machine.

Posted in Big Data, Software Development | Tagged , , | 5 Responses

Hadoop For Enterprises: Event By Churchill Club

This past week was a busy one for Hadoop community with two Hadoop events in Silicon Valley. The first one was “what role will hadoop play in the enterprise” by Churchill Club which attracted about 300 attendees in a Palo Alto hotel. The second one was the much bigger conference Hadoop Summit in San Jose Convention Center. I will write a separate article on the second event soon.

Posted in Big Data | Tagged , , , | 1 Response

Best Tool to Compress Virtual Machines

While working in virtualized environments, we need to pass around virtual machines (a.k.a. virtual appliances) from time to time. Most of the virtual machines I’ve seen for downloading are compressed to save storage and network bandwidth.

Not all the compression algorithms are created equal in terms of compression ratio, compressing speed, and decompressing speed. In most cases, it doesn’t really matter that much with documents and small programs. But it matters a lot with virtual machines whose virtual disk files are much larger than normal files. Any small percentage improvement can result in significant saving on storage and bandwidth.

Posted in Software Development, Virtualization | Tagged , , | 12 Responses

The Data Stack: The Next Focus of Cloud Computing?

Many of us have already heard of the term “software stack.” It shows the software layers in boxes stacking up on each other, all the way from operating system, to middleware, and to applications. When these layers are offered as services, we have IaaS (Infrastructure As A Service), PaaS (Platform As A Service), and SaaS (Software As A Service) respectively for so called cloud service stack. These two stacks are essentially similar if not the same.

Posted in Cloud Computing | Tagged , , , | Leave a comment

The Data is the Cloud

Once upon a time, there was a famous vision – “The network is the computer.” If you have been with ITindustry long enough, you would know what the company was behind the vision. Inspired the vision for computer, I am inventing yet another one for cloud – “The data is the cloud.”

Posted in Cloud Computing | Tagged , , | Leave a comment

What Does Oracle-Google Case Mean For Cloud Computing?

As a software professional using Java since its very beginning, I have been following the case regarding Google’s using Java APIs in its Android OS. I don’t want to repeat what has happened so far because you can find these updates by searching the Internet. All I want to say is that the case is pretty educational not only on the technology itself but also on the legal side like patents, copyright.

Posted in Cloud Computing | Tagged , , , , , , | Leave a comment

Redefining Software in Cloud Age

As software professionals, we may still use the same programming languages and tools as 10 years ago. But there has been a fundamental shift in how we think of software, and make and consume software.

Static blueprints

Traditionally software really means blueprints, which are used to construct running software instances. The blueprints include binary code, installer, and related documentations guiding the installation and configuration of the software. Software vendors make the software packages and sell them to customers who then deploy and run them.

Posted in Cloud Computing, Software Development | Tagged , | 2 Responses

What Is Missing in Current Software?

If we look closely at the software today, we will find some important pieces missing. For example, the software code defines logical behaviors of a system, but not the performance and scalability aspects. In other words, the operational aspects of the software are not clear even if you have a software product.

Posted in Software Development | Tagged , , | Leave a comment

Effective Strategies to Simplify

Having read my articles on vSphere APIs and software designs, you may feel a bit bored. Today I will write something different and generic: how to simplify things.

By nature, the world is complicated as it should be, and will remain so or even more forever. Simplification does not change that fact, but your perception about the world. Unless you are writing research papers, you want to simplify things you work on.

Posted in Others | Tagged , , | Leave a comment

Open Source VI Java API 5.0.1 Released

While preparing this annoucement, I realize that on the same day last year we had a very successful community event with several techtalks to celebrate the 3 year of vijava open source project. Today it’s the 4th year of this project!

Since VI Java API 5.0 GAed last October, there have been some changes, one of which is that I left VMware and joined VCE the same month. On the project side, there are several new bugs opened with the forum. These bugs do not affect most developers. But still I fixed them quickly in the code repository so that anyone who was affected could get the fixes from there.

Posted in vSphere API | Tagged | 21 Responses

Recent Interview with BizTech Magazine

I just did an interview with Ricky Ribeiro, who is online content manager of BizTech Magazine. It was published last week as part of the Q&A series of Must Read IT blogs. In response to Ricky’s great questions, I shared thoughts on a broad range of topics, including blogging, cloud computing, and technical innovation in general.

The following is part of the article. For full coverage, please check out here, where you can also find links to interviews with other top IT bloggers.

Posted in News & Events | Tagged , , | Leave a comment
  • NEED HELP?


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__ doublecloud.org.

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.