This post is based on my notes taken at the talk by John Adams at LISA 2010 conference. Any mistakes, if any, are all mine. Should you be interested in other sites, check out Google, Facebook, LinkedIn.
As one of the leading social Web site with 165M users, Twitter demands a huge infrastructure support its operation. There are 700M searches and 1,000 tweets per second and can go up to almost 4,000 at peak. The number of tweets is not that impressive, but these tweets need to be distributed to numerous followers which could be several millions after one account.
These days Twitter gets 75% traffic from API and 25% from the Web. The new twitter.com Web interface heavily uses AJAX and acts as API client to its backend.
As John put it, “nothing works the first time.” His recommendation is to use the best available technology for scaling. You will need to plan and build for more than one time to get it right. Read more...
While reading the recent Dec 2010 issue of MSDN magazine, I found an article (Multiparadigmatic .Net, Part 4: Object Orientation) with misunderstandings on object oriented design. I was surprised that the author reached conclusions like, “squares aren’t rectangles,” and “no happy ending here.” The conclusions are based on misunderstandings of object oriented design.
Let me show you what the root problem is and how to get a happy ending. After reading this, you won’t be bothered by “squares aren’t rectangles.”
What’s the problem?
As most people already know, inheritance or generalization (I prefer the latter) is an important feature of OOD. Using it effectively can lead to a good object model and concise codebase. In an inheritance relationship, a subtype must maintain “IS-A” relationship with its super type, for example, a Student type IS-A Person. I think most people are just fine with this. Read more...
VMware vSphere provides comprehensive performance metrics for your needs on performance monitoring and diagnosis. These stats are available through not only vSphere Client but also vSphere APIs. To understand the overall performance management concepts, you want to read this article: Fundamentals of vSphere Performance Management.
Once having the basics, you may wonder what types of stats are exposed. The following table summaries all the 315 performance counters available in vSphere 4.1. As you might have guessed, the information is generated using open source Sphere Java API and then imported into WordPress using WP-Table Reloaded. You can easily sort and search the table.
Update: Carter Shanklin and Luc Dekens have articles on performance counters as well: Read more...
Provide packaged software stack as Platform-as-a-Service (PaaS) platform for running applications
As Known As
We all know the three different types of cloud services from Infrastructure-as-a-Service (IaaS), PaaS, to Software-as-a-Service (SaaS). If you want to leverage PaaS, you have to choose one of the PaaS service providers like Google or Microsoft. Leveraging an external PaaS has its own benefits.
What if you want to keep your applications running in-house but still enjoy the benefits of PaaS? Today you don’t have much choice. Google, for example, does not sell its App Engine as a product that you can install and run on premise. You have to run it on the Google cloud.
Solution Read more...
When a developer learns a new programming language or API, the first thing is probably to try out a HelloWorld sample. As said, real programmers don’t read documents. Although I don’t fully agree on that, it has some truth in it.
In my own experiences, I normally continue with other samples after HelloWorld one. When something is not quite clear, I check out the API reference or read some tutorials. Anyway, I am not telling you how to learn a new language or API, but trying to make a point here on the importance of code samples for the developers. In my opinion, samples are the most effective way to empower your users.
I think you would agree with me, there are too many bad samples. Here are some typical symptoms:
- Too much boilerplate code to a point that the code illustrating the API usage got buried. Typical boilerplate code includes extensive exception handling, GUI, logging, etc. Some samples even have a common library that could confuse your users totally.
- Too many API calls in one sample. You may need several APIs for a use case, but don’t aim one sample for multiple use cases.
- Too much object oriented. Object oriented programming is a best practice for application development. But it could confuse your developers sometimes.
- Dependencies on other APIs. To run the sample, your users need to install other libraries which may or may not need extra configuration or tuning. To understand the sample, users need to understand additional APIs. Extra burden, really!
- Of course, typical bad smells of programming which are not unique for samples. For example, bad naming, unnecessary global variables, using object attributes for passing values between methods, etc.
Now, how you can develop great samples? Besides the best practices writing great applications, you want to follow the following guidelines: Read more...
As a software professional, you may have heard about the source compatibility and binary compatibility. With the Web Services, a new type of compatibility came up. This is what I call wire compatibility. It’s not related to the programming but the XML messages passed on the wire. Since we don’t use XML directly but programming APIs, the wire compatibility surfaces and affects the source and binary compatibility.
Too abstract? You bet. Let’s pick up an example here. Because VMware vSphere API is defined in WSDL, I will use it in the following discussion.
In vSphere 4.1, the method PowerOnMultiVM_Task() gets an additional parameter called option typed as OptionValue array. The following are related parts in the WSDL:
<input message="vim25:PowerOnMultiVM_TaskRequestMsg" />
<output message="vim25:PowerOnMultiVM_TaskResponseMsg" />
<fault name="RuntimeFault" message="vim25:RuntimeFaultFaultMsg"/>
<element name="_this" type="vim25:ManagedObjectReference" />
<element name="vm" type="vim25:ManagedObjectReference" maxOccurs="unbounded" />
<element name="option" type="vim25:OptionValue" minOccurs="0" maxOccurs="unbounded" />
As you can see, the minOccurs of the option element is zero, meaning it’s optional. If you have an application built with 4.0 (no option parameter by then), the SOAP request still works. So it’s compatible on the wire. Read more...
Facebook.com is no doubt the biggest web site surpassing Google in terms of Web traffics in an article published half year ago. Given its scale, the lessons learned would be very helpful for others to build scalable IT infrastructures. This post is based on my notes taken at the talk by Robert Johnson and Sanjeev Kumar at LISA 2010 conference. Should there be any mistakes, they are all mine.
According to the speakers, the architecture of Facebook.com is relatively simple: Web servers in the front, databases at the back. In the middle is a caching layer with a lot of memcached servers. If you recall my previous post, they use PHP extensively.
Unlike other sites, like email sites, whose users are well mapped and isolated to different servers, social Websites like Facebook have unique challenges in that their users are linked together. Errors in one part of a system may cascade easily and bring down the whole site.
Here are several important lessons Facebook learned while building software and operating the site: Read more...
IBM Researcher Kyung Ryu presented a private cloud RC2 at LISA 2010 conference. As a typical IBM project, the presentation has 20+ co-authors. The following is based on my notes taken from the session, therefore may contain my misunderstandings.
Having an internal cloud is not a big deal these days. You can find several products from the market. What is truly unique and challenging for RC2 is that it supports very different virtualization platforms from X86 based hypervisors on X-series servers, to IBM PowerVM on P-series, to the mainframe based native virtualization on Z-series. Therefore RC2 is really a hybrid private cloud.
The talk focused on system architecture with several diagrams. I cannot reproduce these diagrams but would list the key components of the system: Read more...
Last week I attended a great talk by Google Fellow Jeffrey Dean at Stanford University. Jeff talked about his first hand experience on building software systems at Google since 1999 and lessons learned. The following summary is solely based on my notes, therefore may contain my misunderstandings.
A Brief History
During the past 10 years or so, the scale of the Google infrastructure has grown exponentially: # docs 1,000X; #query, 1,000X; per doc index, 3X; update rate from months to seconds, 50,000X; query latency, 5X; computer and computing powers, 1,000X. The underlying infrastructure has experienced 7 major revisions in the last 11 years.
At the concept level, the search infrastructure is simple. It has web servers upfront taking search queries. The queries are then passed on to two different types of servers: index servers and doc servers. For the index server, the input is the query string and the output is an array of doc-id and score pairs. For the doc servers, the input is the doc-id and query pair and the output is the title and snippet of the doc. Note that the snippet of the doc is query dependent so that you can find your keywords highlighted in the result pages. How to quickly and accurately calculate the output based on input involves a lot of advanced algorithms, and is not in the scope of Jeff’s talk. Read more...
Provide a single point of contact for a large-scale system consisting of many virtual machines so that they are viewed as one giant VM from outside
As Known As
When a system becomes big, you need multiple VMs to support the workload. For ease of use reasons, external users don’t want to manage multiple connections to each of the virtual machines. Who wants to remember a list of IP or DNS names for a service? Also, you just cannot expect your users to pick up the least-busy VMs for balanced workloads across your cluster of VMs. And to scale your application when your overall workload increases, you want a seamless way adding new capacities without notifying others.
Finally, if you offer a public service, you don’t want to allocate a public IP address for each of your VMs. These days, public IPs are scarce resources and may cost you money. Read more...
Later last week I attended 24th LISA conference by USENIX at San Jose Convention Center. The name LISA stands for Large Installation System Administration. It’s a great conference focusing on technology, training, and professional development for system administrators.
I am not a system administrator, but wanted to know more about system administration in general because of devops movement. So I attended many technical sessions covering from storage, networking, release engineering, cloud computing, social network website management, to career development as a system admin. I will blog some of these sessions I attended based on my notes. Read more...
Ensure a virtual machine does not carry a permanent state so that it can be easily provisioned, migrated, and managed in the cloud.
Also Known As
Virtualization is the cornerstone for cloud computing, especially at the infrastructure level. With many virtual machines created, managing them becomes a big challenge.
Among these challenges are system provisioning, backup, archiving, and patching different virtual machines. These administrative tasks take lots of CAPEX and OPEX.
We need a better way to architect applications for the cloud.
Making a VM stateless solves a lot of problems. For one thing, you force applications to save data outside of the virtual machine. No longer do you need to back up the virtual machine – only the data. It also makes the system provisioning easier without differentiating the different instances. When the stateless VMs crash for whatever reasons, you don’t lose much. Just add a new virtual machine and voila! Read more...
Separate concerns in large scales computing by leveraging different types of services in the cloud
The history of computing reveals different eras starting from mainframe to client/server to Web computing. With mainframes, computing is contained within the boundary of a mainframe. With client/server and web computing we see the separation of the presentation from the data. With all these computing models, the data is owned and maintained by different applications. The IT staffs who run and maintain the applications are responsible for backing up and maintaining data.
With the rise of cloud computing, I see a new trend that will fundamentally change the game and push productivity to all new levels. I call this “Aspectual Centralization” (AC). This is as important to cloud architecture as Model-View-Controller (MVC) is to software architecture.
With AC, different aspects of an application are extracted out and delegated to centralized services: data services, messaging services, logging services, and so on. Read more...
IBM DeveloperWorks recently published the result of a survey of 2000 IT professionals excluding IBM employees. The key findings are:
- Cloud Computing to overtake on-premise computing. For the question, “how do you rate the potential for cloud computing to overtake on-premise computing as the primary way organizations acquire IT by 2015?” 30.4% said likely, 21.6% most likely, and 13.6% definitely.
- Mobile application development to dominate. 55% of respondents see app development on mobile grows than other platforms in 5 years.
- IT professionals need, but often lack, industry-specific knowledge. 28.3% thought moderately important, 45.6% very important, and 15.9% extremely important. This is not an IT trend per se, but represents the demands for IT professionals.
The first two findings are mentioned and somewhat confirmed by another survey by Forrest and Dr. Dobbs, which is more developer oriented: Read more...
The Cloud Computing Expo takes place in the Santa Clara Convention Center from this Monday to Thursday. This twice-a-year event attracted thousands of attendees. Thanks to the invitation from Jeremy Geelan, I went to conference checking out several sessions and the exhibitions.
I found many familiar companies in the exhibition, from Oracle, Microsoft, VMware, and many other companies. Unlike VMworld, I don’t find many IHVs in the show. The ISVs demoed their products with strong focuses on Cloud. Microsoft for example demoed its Windows Azure family of services; VMware demoed its vCloud Director. I even found IBM booth which was much smaller than I expected. It turned out to be its recently acquired CastIron part.
Here are several companies I found interesting technologies from my trip: Read more...
If you attended last year’s VMworld keynote by Steve Herrod or watched the online broadcasting, you may still recall the code2cloud.com website (see the top banner here). That was a very simple Web application meant for the keynote attendees to submit names and email addresses to win a chance to go to the backstage with the Foreigner band. The website was hosted at Terramark vCloud and continued to run for about one month afterward. Read more...
Provide a mechanism to fast provision virtual machines (VMs) and manage their lifecycles by maintaining a pool of virtual machines.
Virtual machines can be expensive to create. It takes several minutes to create a new virtual machine. Technologies like linked clone and storage offloading can help speed up the process, but it still takes time. And these alternative approaches, in some use cases, do not help when you need instant provisioning.
It’s generally a good practice to pool resources that are expensive to create. In programming, you pool threads and check them out on demand. When it’s done, you check them in back to the pool. This is what most Web Servers do for high performance.
You can leverage the same idea for VM provisioning. You create new virtual machines and put them into a pool. When there is a new request, you just check out one virtual machine from the pool. The following diagram shows how it works. Read more...
Several weeks ago, Christopher Wells sent me an email for permission for translating my blog into Japanese so that more people in Japan can benefit. Today he told me he had translated one of my posts into Japanese and posted it on his blog site. Here is the first paragraph in Japanese version:
オペレーティング･システム（OS）はソフトウェアの一部で、コンピュータのハードウェアを管理し、様々なアプリケーション用の一般サービスを提供しま す。クラウド・コンピューティングの台頭にともなって、OSがまだ関連するのか、また、将来のクラウドにおいてOSがどんな役割を果たすのか疑問に思う人 もいるでしょう。
Now a little quiz: which original article did Chris translate? I will leave it to you to guess out. Hint: here is the link to the full article. Read more...
In my last blog, I wrote about pattern idea from the famous books “Design Pattern” and “A Pattern Language” and how it can be applied to cloud architecture design. Below and in later posts in this series I shall follow the content outline used there to illustrate the cloud architecture patterns.
Provide a standard way to create new virtual machines based on user requirements.
There are enormous combinations of virtual machines with different operating systems, middleware, and applications. And then we have user data to the mix! We need a standard way to create new virtual machines.
General speaking, there are three basic ways to create new virtual machines: Read more...
Most developers may have noticed the asynchronous methods in vSphere API like PowerOnVM_Task method, but not so many know their synchronous peers like PowerOnVM before 4.1. VMware vSphere API Reference doesn’t mention them at all. But you can find them in WSDL(check out the WSDL snippets at the end of this article).
There is an exception however. In VI Perl, these synchronous methods are exposed. There, you can choose which one to use. In vSphere Java API 2.0, these methods are exposed only in the stub layer but not the object layer. You don’t want to use stub methods directly when you can use objects, therefore I don’t talk much about it even in my book. Somehow I came across a question in the forum asking about this. So I think it may be good to share a little history and insight here.
The differences of these twin methods are minimal. They have exactly same parameters but different returns. The methods whose names include _Task suffix have Task returned. When you have the Task return, the operation may not yet be done at the server side. But with the Task object, you can track the progress, and even get the result data objects. Read more...