This is the last note I took from LISA 2010 conference. It’s a great talk by Loren Jan Wilson drawing his experience with vendors while working at a super computer center.
The super computer, Intrepid, consists of 40,960 nodes on 40 racks. Each node has 4 core CPU. Of all the nodes, 640 nodes are dedicated for I/O. There is no local storage at each node. The super computer links to a very large tape library for archiving.
While operating the super computer, the speaker had some issues with high-speed network switches, e.g. 6% random port death, 15% quad port flaky but never fail 100%. To complicate the issue, there is no log and CLI for troubleshooting, but Web interface.
I believe the trouble the speaker faced before is not a single case in the industry, and never be. As long as you have to buy equipment/software from vendors, there will be issues one way or the other. A great thing the speaker did is to summarize and share the tips on how a customer should work with an IT vendor for a successful IT project.
I find these tips are very helpful, and think customers and vendors should all know about them as listed below: Read more...
With the rising trend of devops movement, I was curious about the system administration from a software developer’s perspective. That’s why I sat through Adam Moskowitz’s session “The Path to Senior Sysadmin.” Adam summarized the system administrator’s skills to three categories: hard tech skills, squishy tech sill, and software skills as detailed in following. Again, this is based on my note taken from LISA 2010 conference. For other posts related to the conference, check here.
Hard Tech Skills Read more...
- All the commands for system administration;
- System backup;
- Some programming skills like Shell scripting, Perl/Python, C (read);
- Software engineering knowledge like versioning, process;
This post is based on my notes taken at the talk by John Adams at LISA 2010 conference. Any mistakes, if any, are all mine. Should you be interested in other sites, check out Google, Facebook, LinkedIn.
As one of the leading social Web site with 165M users, Twitter demands a huge infrastructure support its operation. There are 700M searches and 1,000 tweets per second and can go up to almost 4,000 at peak. The number of tweets is not that impressive, but these tweets need to be distributed to numerous followers which could be several millions after one account.
These days Twitter gets 75% traffic from API and 25% from the Web. The new twitter.com Web interface heavily uses AJAX and acts as API client to its backend.
As John put it, “nothing works the first time.” His recommendation is to use the best available technology for scaling. You will need to plan and build for more than one time to get it right. Read more...
Facebook.com is no doubt the biggest web site surpassing Google in terms of Web traffics in an article published half year ago. Given its scale, the lessons learned would be very helpful for others to build scalable IT infrastructures. This post is based on my notes taken at the talk by Robert Johnson and Sanjeev Kumar at LISA 2010 conference. Should there be any mistakes, they are all mine.
According to the speakers, the architecture of Facebook.com is relatively simple: Web servers in the front, databases at the back. In the middle is a caching layer with a lot of memcached servers. If you recall my previous post, they use PHP extensively.
Unlike other sites, like email sites, whose users are well mapped and isolated to different servers, social Websites like Facebook have unique challenges in that their users are linked together. Errors in one part of a system may cascade easily and bring down the whole site.
Here are several important lessons Facebook learned while building software and operating the site: Read more...
IBM Researcher Kyung Ryu presented a private cloud RC2 at LISA 2010 conference. As a typical IBM project, the presentation has 20+ co-authors. The following is based on my notes taken from the session, therefore may contain my misunderstandings.
Having an internal cloud is not a big deal these days. You can find several products from the market. What is truly unique and challenging for RC2 is that it supports very different virtualization platforms from X86 based hypervisors on X-series servers, to IBM PowerVM on P-series, to the mainframe based native virtualization on Z-series. Therefore RC2 is really a hybrid private cloud.
The talk focused on system architecture with several diagrams. I cannot reproduce these diagrams but would list the key components of the system: Read more...
Later last week I attended 24th LISA conference by USENIX at San Jose Convention Center. The name LISA stands for Large Installation System Administration. It’s a great conference focusing on technology, training, and professional development for system administrators.
I am not a system administrator, but wanted to know more about system administration in general because of devops movement. So I attended many technical sessions covering from storage, networking, release engineering, cloud computing, social network website management, to career development as a system admin. I will blog some of these sessions I attended based on my notes. Read more...