GPU for Big Data Processing
While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.
As its name suggests, GPU comes from graphics processing. According to this Wikipedia page, “The term was popularized by Nvidia in 1999, who marketed the GeForce 256” as the ‘the world’s first GPU, a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.”
Time to learn how to "Google" and manage your VMware and clouds in a fast and secure
HTML5 AppDespite its graphics root, GPU has evolved as general purpose GPUs and play a role in high performance computing by turning its massive computational power into general purpose computing power. Three of five most powerful supercomputers in the world as of 2010 take advantage of GPUs. Given these, no one can deny the potential of GPUs in big data processing.
When searching for related researches, I found a very interesting blog article “Scaling-up GPUs for “Big Data” Analytics – MapReduce and Fat Nodes.” According to the author, “These small – medium size hybrid CPU-GPU clusters will be available at 1/10 the hardware cost and 1/20 the power consumption costs, and, deliver processing speed-ups of up to 500x or more when compared with regular CPU clusters running Hadoop MapReduce.”
As disruptive as the technology is the cost, which allows more organizations, especially SMB, to afford big data analytics. Enough being said about the technology and benefits of using GPU in big data processing.
From system point of view, data processing is only part of the whole process of importing, processing, and data. No matter how fast you can process data with GPU, you are still limited by the rest. That is this research paper all about: “For some applications with large data sets, the memory-transfer overhead combined with the kernel time took longer than 50x the GPU processing time itself. However, the amount of overhead can vary drastically depending on how a GPU kernel will be used in an application, or by a scheduler.”
The paper (mentioned in the post) missed one more thing. That is, the GPU can be used for a fixed number of seconds in case of CUDA. After that the system hangs. In latter version of CUDA environment, they controlled the hanging thing but the system aborts a running program after the time expires. And it is an uncontrolled environment. My masters thesis was developing a STM for Graphics Card using CUDA in 2009. At that time there was no debugger for CUDA. Latter they provided the debugger but it runs in emulation mode which can not imitate the concurrent environment of graphics card.
Tayyab, thanks for sharing your experience with us. I think there are still a lot of works espeically the tooling side.
Steve
It is much more than that. CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.
GPU for Big Data Processing – http://t.co/p3hy9I4U
GPU for Big Data Processing | DoubleCloud => Private Cloud + Public Cloud https://t.co/hSVzhveuJ9