What You Can Learn from IBM Research on Designing Private Cloud
IBM Researcher Kyung Ryu presented a private cloud RC2 at LISA 2010 conference. As a typical IBM project, the presentation has 20+ co-authors. The following is based on my notes taken from the session, therefore may contain my misunderstandings.
Having an internal cloud is not a big deal these days. You can find several products from the market. What is truly unique and challenging for RC2 is that it supports very different virtualization platforms from X86 based hypervisors on X-series servers, to IBM PowerVM on P-series, to the mainframe based native virtualization on Z-series. Therefore RC2 is really a hybrid private cloud.
Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.
The talk focused on system architecture with several diagrams. I cannot reproduce these diagrams but would list the key components of the system:
- Cloud Dispatcher. As its name suggests, it takes requests from the front end and dispatches them to related servers at backend. It maintains two queues: one for synchronous processing and the other for asynchronous processing. The latter is needed for these requests that take long time, for example, creating a new virtual machine.
The dispatcher also serves as gate keeper. Based on the capacity, it can admit or reject the requests. As a limit, the system can only handle about 256 concurrent virtual machine creations, which is good enough unless you are a service provider.
- Image Manager. It manages the virtual machine image repository. It exposes REST APIs for operations like checkin / checkout / publish / list / deprecate / unpublish / add, etc. These APIs can be called by Instance Manager.
The image manger leverages mirage image library which maintains parent/child relationship of different images. The library may have been open sourced according to Dr. Ryu.
- Instance Manager. It creates new instances upon requests. The detailed steps include:
- Reserve system resources, such as host and IP address
- Register with TPM (Tivoli Provisioning Manager?)
- Clone virtual machine image
- Copy SSH keys and fix up image
- Setup actuator engine for first booting
- Register VM with hypervisor
- Start VM
- Wait while pinging the VM
- Report back
- Security Manager. It manages security aspect of the RC2.
- User Manager. It manages users and authentications.
- Chargeback. It listens event manager for events like VM start, VM destroy, etc. Based on these events, it uses BIRT report engine to generate reports. By applying chargeback with blue dollar (IBM internal currency), they observed a sudden drop (more than half) of VM instances overnight.
These components communicate with each other with REST APIs. The hypervisor differences are isolated so that they can support new hypervisors without big changes.
Besides these computing components, RC2 also has SAN storages for storing both virtual machine images, and virtual machine instances.
Besides the architecture, Dr. Ryu shared an interesting story regarding image conversion. The VM images were mostly XEN based in the beginning. After XenSource was sold to Citrix, they converted 600+ images to KVM format. The process went smoothly due to the mirage image library.
The RC2 has not only served folks at IBM Research but also others from other divisions. It has reached production quality. Next step? I think IBM should open source RC2 and claim its leadership position in cloud computing. What do you think?