While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.
Recently I got several questions and even a bug on supporting the next release of vSphere in the open source VI Java API. The questions are mostly from VMware partners who have early access of the private beta of next release of vSphere and want to ship their own products at the same time of vSphere GA. I figure more partners may have the same question, therefore decide to answer it all here with a possible work around.
I went to LinkedIn last Wednesday for a tech talk by UC Berkeley professor Joseph Hellerstein on Programming for Distributed Consistency: CALM and Bloom. This is indeed a highly specialized topic, so I am not going to talk about the details. Should you be interested in the new programming language Bloom, you can check the web site (http://bloom-lang.org).
After the Churchill event on Hadoop for enterprises, I attended the Hadoop Summit in San Jose convention center. It’s one of the benefits living in Silicon Valley that I can attend various tech events without flying away from family for days.
Given the growing popularity of Hadoop, I decided to give it a try by myself. As normal, I searched for a tutorial first and got one by Yahoo, which is based on Hadoop 0.18.0 virtual machine. I knew the current stable version is 1.x, but that is OK because I just wanted to get a big picture and I didn’t want to refuse the convenience of ready-to-use Hadoop virtual machine.
This past week was a busy one for Hadoop community with two Hadoop events in Silicon Valley. The first one was “what role will hadoop play in the enterprise” by Churchill Club which attracted about 300 attendees in a Palo Alto hotel. The second one was the much bigger conference Hadoop Summit in San Jose Convention Center. I will write a separate article on the second event soon.
While working in virtualized environments, we need to pass around virtual machines (a.k.a. virtual appliances) from time to time. Most of the virtual machines I’ve seen for downloading are compressed to save storage and network bandwidth.
Not all the compression algorithms are created equal in terms of compression ratio, compressing speed, and decompressing speed. In most cases, it doesn’t really matter that much with documents and small programs. But it matters a lot with virtual machines whose virtual disk files are much larger than normal files. Any small percentage improvement can result in significant saving on storage and bandwidth.
Many of us have already heard of the term “software stack.” It shows the software layers in boxes stacking up on each other, all the way from operating system, to middleware, and to applications. When these layers are offered as services, we have IaaS (Infrastructure As A Service), PaaS (Platform As A Service), and SaaS (Software As A Service) respectively for so called cloud service stack. These two stacks are essentially similar if not the same.
Once upon a time, there was a famous vision – “The network is the computer.” If you have been with ITindustry long enough, you would know what the company was behind the vision. Inspired the vision for computer, I am inventing yet another one for cloud – “The data is the cloud.”