Facebook HipHop Compiler for PHP: What Is It For You?

I attended a great seminar at Stanford by Haiping Zhao on the open source compiler which converts PHP code to C++ yesterday. Haiping is the tech lead for the open source HipHop project at Facebook.

As many have known, Facebook is a PHP shop with all the front end dynamic pages written in PHP. The upside of using PHP is that it’s very easy to read, write and debug, plus platform independent. The downside is that it’s really slow, probably one of the slowest scripting languages.

Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.

Why PHP is slow?

Haiping summarized three reasons, which he thinks are common contributors for slowness of scripting languages in general:

  1. Byte-code interpreter.
  2. Dynamic symbol lookups, including functions, variables, constants, class methods, properties, etc.
  3. Weakly typing. The zval has to evaluate the data type of any variable before any operation. Plus, the PHP array is too generic because it can represent any collection. 

Why Should Facebook Care?

When Haiping joined Facebook, new servers cannot catch up the new users. The server farms became so big that any percentage saving could save the company millions of dollars. Like all the big web companies, Facebook does not disclose the number of servers they have. The size of the datacenter is guarded as a secret. One of the professors did an estimate in his questions anyway: 15,000 to 30,000 servers.

If each server costs $1000 (should be higher than this, I just made it up here), the total cost would be $15M to $30M. If performance had been improved 100%, the saving could have been $15 to $30M, not to mention the networking/facility costs. The saving well offsets the 8 engineers for the HipHop compiler project.

So it’s all about scale of economy that makes the project attractive.

How to Make it Faster?

There may be several approaches to improve the performance. They first looked the implementation of ZEND engine, and only found out that the engine has done what it should do on performance optimization.

Then they have to think alternative ways, which resulted in the HipHop project. Converting the code to C++ can avoid the three problems mentioned above. But porting code is not an easy work, even so if you want to automatically generate code.

But wait, why couldn’t they switch to other languages like Java? That might be a good choice in theory. In practice, it’s hard to port the large code base. More importantly, it’s even harder to convince the existing engineering team to change their preferred programming languages.

Why C++?

The C++ was chosen as the target language because it’s one of the fastest languages. It’s also object oriented so that the classes in PHP can easily converted to the classes in C++. It helps the code readability.

The code readability is important because the C++ code could be further improved on performance depending on whether it’s truly a bottleneck. But this is not recommended. Most of the time, people still PHP as before. C++ is only generated in deployment phase.

The Challenges

One of the biggest challenges in converting scripting code to compiling code is to find out the types of variables beforehand; otherwise you cannot achieve the efficiency of compiling languages. It turns out you can only infer the types, not get it exactly. This may be a forever challenge.

To assist the type inference, the generator runs the PHP code to see what type a variable is. That type is of course not 100% guaranteed. Haiping has a 90/10 theory ( not the famous 80/20), which says if you can make 90% of the cases faster and 10% no worse, it’s a good optimization.

The PHP code is very flexible on the return types. For example, a function can return an integer or Boolean as follows:

function foo {
          if($condition) return 10;
          return false;
}

Given this, the Variant type has to be used because it’s the one generic enough to cover both. 

The type inference is hard. How about having coding standard say $bExisting is a Boolean variable, $nCount is an integer? It would be great to have such a case, but hard to enforce. After all, no engineer would like to follow a new coding convention.

But over time, when people see value, they might come for advices. It takes time to show the value before it happens.

So What?

For most people who run small PHP based Web applications, it’s not a big deal. The saving may not offset the effort you put in for setting this up. For big service providers or companies running PHP server farms, it would make a big savings with the reduced number of servers. This will only become more important when the more companies get into cloud computing.

In any case, the project is fun to follow if you are interested in programming languages especially compilers. For more information, check out the project and the presentation.

This entry was posted in Software Development and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. Posted December 11, 2013 at 1:53 am | Permalink

    Everything is very open with a clear explanation of the
    challenges. It was truly informative. Your site is useful. Thanks
    for sharing!

  2. Posted December 21, 2013 at 6:18 am | Permalink

    Excellent blog you have got here.. It’s difficult to find excellent writing like yours these days.
    I truly appreciate people like you! Take care!!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • NEED HELP?


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__ doublecloud.org.

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.