Home > Big Data > GPU for Big Data Processing

GPU for Big Data Processing

While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.

As its name suggests, GPU comes from graphics processing. According to this Wikipedia page, “The term was popularized by Nvidia in 1999, who marketed the GeForce 256” as the ‘the world’s first GPU, a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.”

Time to learn how to "Google" and manage your VMware and clouds in a fast and secure


Despite its graphics root, GPU has evolved as general purpose GPUs and play a role in high performance computing by turning its massive computational power into general purpose computing power. Three of five most powerful supercomputers in the world as of 2010 take advantage of GPUs. Given these, no one can deny the potential of GPUs in big data processing.

When searching for related researches, I found a very interesting blog article “Scaling-up GPUs for “Big Data” Analytics – MapReduce and Fat Nodes.” According to the author, “These small – medium size hybrid CPU-GPU clusters will be available at 1/10 the hardware cost and 1/20 the power consumption costs, and, deliver processing speed-ups of up to 500x or more when compared with regular CPU clusters running Hadoop MapReduce.”

As disruptive as the technology is the cost, which allows more organizations, especially SMB, to afford big data analytics. Enough being said about the technology and benefits of using GPU in big data processing.

From system point of view, data processing is only part of the whole process of importing, processing, and data. No matter how fast you can process data with GPU, you are still limited by the rest. That is this research paper all about: “For some applications with large data sets, the memory-transfer overhead combined with the kernel time took longer than 50x the GPU processing time itself. However, the amount of overhead can vary drastically depending on how a GPU kernel will be used in an application, or by a scheduler.”

Categories: Big Data Tags: , , ,
  1. Tayyab
    June 27th, 2012 at 03:28 | #1

    The paper (mentioned in the post) missed one more thing. That is, the GPU can be used for a fixed number of seconds in case of CUDA. After that the system hangs. In latter version of CUDA environment, they controlled the hanging thing but the system aborts a running program after the time expires. And it is an uncontrolled environment. My masters thesis was developing a STM for Graphics Card using CUDA in 2009. At that time there was no debugger for CUDA. Latter they provided the debugger but it runs in emulation mode which can not imitate the concurrent environment of graphics card.

  2. June 27th, 2012 at 12:03 | #2

    Tayyab, thanks for sharing your experience with us. I think there are still a lot of works espeically the tooling side.

  3. September 17th, 2012 at 11:04 | #3

    It is much more than that. CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.

  4. January 15th, 2013 at 11:46 | #4

    GPU for Big Data Processing – http://t.co/p3hy9I4U

  5. July 19th, 2016 at 10:56 | #5

    GPU for Big Data Processing | DoubleCloud => Private Cloud + Public Cloud https://t.co/hSVzhveuJ9

  1. No trackbacks yet.