GPU for Big Data Processing

While talking about the data processing, we naturally take CPU for granted. However, latest GPU (Graphics Processing Unit, also know as Visual Processing Unit, or VPU) comes with hundreds of cores and calculates much faster than CPU. The question is how practical it is to use GPUs in processing big data.

As its name suggests, GPU comes from graphics processing. According to this Wikipedia page, “The term was popularized by Nvidia in 1999, who marketed the GeForce 256” as the ‘the world’s first GPU, a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.”

Bothered by SLOW Web UI to manage vSphere? Want to manage ALL your VMware vCenters, AWS, Azure, Openstack, container behind a SINGLE pane of glass? Want to search, analyze, report, visualize VMs, hosts, networks, datastores, events as easily as Google the Web? Find out more about vSearch 3.0: the search engine for all your private and public clouds.

Despite its graphics root, GPU has evolved as general purpose GPUs and play a role in high performance computing by turning its massive computational power into general purpose computing power. Three of five most powerful supercomputers in the world as of 2010 take advantage of GPUs. Given these, no one can deny the potential of GPUs in big data processing.

When searching for related researches, I found a very interesting blog article “Scaling-up GPUs for “Big Data” Analytics – MapReduce and Fat Nodes.” According to the author, “These small – medium size hybrid CPU-GPU clusters will be available at 1/10 the hardware cost and 1/20 the power consumption costs, and, deliver processing speed-ups of up to 500x or more when compared with regular CPU clusters running Hadoop MapReduce.”

As disruptive as the technology is the cost, which allows more organizations, especially SMB, to afford big data analytics. Enough being said about the technology and benefits of using GPU in big data processing.

From system point of view, data processing is only part of the whole process of importing, processing, and data. No matter how fast you can process data with GPU, you are still limited by the rest. That is this research paper all about: “For some applications with large data sets, the memory-transfer overhead combined with the kernel time took longer than 50x the GPU processing time itself. However, the amount of overhead can vary drastically depending on how a GPU kernel will be used in an application, or by a scheduler.”

This entry was posted in Big Data and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Tayyab
    Posted June 27, 2012 at 3:28 am | Permalink

    The paper (mentioned in the post) missed one more thing. That is, the GPU can be used for a fixed number of seconds in case of CUDA. After that the system hangs. In latter version of CUDA environment, they controlled the hanging thing but the system aborts a running program after the time expires. And it is an uncontrolled environment. My masters thesis was developing a STM for Graphics Card using CUDA in 2009. At that time there was no debugger for CUDA. Latter they provided the debugger but it runs in emulation mode which can not imitate the concurrent environment of graphics card.

  2. Posted June 27, 2012 at 12:03 pm | Permalink

    Tayyab, thanks for sharing your experience with us. I think there are still a lot of works espeically the tooling side.

  3. Posted September 17, 2012 at 11:04 am | Permalink

    It is much more than that. CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.

  4. Posted January 15, 2013 at 11:46 am | Permalink

    GPU for Big Data Processing –

  5. Posted July 19, 2016 at 10:56 am | Permalink

    GPU for Big Data Processing | DoubleCloud => Private Cloud + Public Cloud

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.