GPU for Big Data Processing
While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.
As its name suggests, GPU comes from graphics processing. According to this Wikipedia page, “The term was popularized by Nvidia in 1999, who marketed the GeForce 256” as the ‘the world’s first GPU, a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.”
Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.
Despite its graphics root, GPU has evolved as general purpose GPUs and play a role in high performance computing by turning its massive computational power into general purpose computing power. Three of five most powerful supercomputers in the world as of 2010 take advantage of GPUs. Given these, no one can deny the potential of GPUs in big data processing.
When searching for related researches, I found a very interesting blog article “Scaling-up GPUs for “Big Data” Analytics – MapReduce and Fat Nodes.” According to the author, “These small – medium size hybrid CPU-GPU clusters will be available at 1/10 the hardware cost and 1/20 the power consumption costs, and, deliver processing speed-ups of up to 500x or more when compared with regular CPU clusters running Hadoop MapReduce.”
As disruptive as the technology is the cost, which allows more organizations, especially SMB, to afford big data analytics. Enough being said about the technology and benefits of using GPU in big data processing.
From system point of view, data processing is only part of the whole process of importing, processing, and data. No matter how fast you can process data with GPU, you are still limited by the rest. That is this research paper all about: “For some applications with large data sets, the memory-transfer overhead combined with the kernel time took longer than 50x the GPU processing time itself. However, the amount of overhead can vary drastically depending on how a GPU kernel will be used in an application, or by a scheduler.”