Archive

Posts Tagged ‘scalabilty’

Critical Lessons Learned at Facebook on Scalability and Reliability

November 21st, 2010 1 comment

Facebook.com is no doubt the biggest web site surpassing Google in terms of Web traffics in an article published half year ago. Given its scale, the lessons learned would be very helpful for others to build scalable IT infrastructures. This post is based on my notes taken at the talk by Robert Johnson and Sanjeev Kumar at LISA 2010 conference. Should there be any mistakes, they are all mine.

According to the speakers, the architecture of Facebook.com is relatively simple: Web servers in the front, databases at the back. In the middle is a caching layer with a lot of memcached servers. If you recall my previous post, they use PHP extensively.

Unlike other sites, like email sites, whose users are well mapped and isolated to different servers, social Websites like Facebook have unique challenges in that their users are linked together. Errors in one part of a system may cascade easily and bring down the whole site.

Here are several important lessons Facebook learned while building software and operating the site:

What Lessons You Can Learn from Google on Building Infrastructure

November 15th, 2010 No comments

Last week I attended a great talk by Google Fellow Jeffrey Dean at Stanford University. Jeff talked about his first hand experience on building software systems at Google since 1999 and lessons learned. The following summary is solely based on my notes, therefore may contain my misunderstandings.

A Brief History

During the past 10 years or so, the scale of the Google infrastructure has grown exponentially: # docs 1,000X; #query, 1,000X; per doc index, 3X; update rate from months to seconds, 50,000X; query latency, 5X; computer and computing powers, 1,000X. The underlying infrastructure has experienced 7 major revisions in the last 11 years.

At the concept level, the search infrastructure is simple. It has web servers upfront taking search queries. The queries are then passed on to two different types of servers: index servers and doc servers. For the index server, the input is the query string and the output is an array of doc-id and score pairs. For the doc servers, the input is the doc-id and query pair and the output is the title and snippet of the doc. Note that the snippet of the doc is query dependent so that you can find your keywords highlighted in the result pages. How to quickly and accurately calculate the output based on input involves a lot of advanced algorithms, and is not in the scope of Jeff’s talk.

Tips on session management for scaling your server applications to vSphere

January 24th, 2010 2 comments

Our business team invited me to a phone call with one of our strategic partners days ago. They had a scalability issue with their server application. It turned out to be related to session management. I think they are not the only one who got into this type of problems, and most likely not the last one. So I decide to share it and hopefully you can avoid similar problems in your projects.