As cloud computing gains momentum, more mega data centers are constructed or to be constructed. You can find cool videos on how companies like Google, Microsoft build and run their state-of-the-art data centers.
In these data centers, computers/storage/switches are packed and wired inside containers in factory before being shipped to a data center. After hooking up power, networking, and cooling, a container of servers are ready to go. These advances have resulted in higher facility efficiency and mobility of hardware – you can relocate a container of computers overnight.
Managing and maintaining these mega data centers are always challenging, and involving not only management tools, but also operation processes and best practices.
On software side, you need provisioning system, monitoring system, automation system, ticketing system, and etc. for three basic resources in a data center: compute, storage, and networking. For the operation efficiency, here is where biggest differences can be made. I think programming data centers are the way to go.
Virtualization as Cornerstone
Virtualization has been playing a pivotal role in modernizing data centers because it brings in abstraction and flexibilities by detaching compute from hardware. The provisioning and lifecycle of a virtual machine is therefore fully programmable.
Similar things are happening to storage and networking, and somewhat related to virtualization. For example, storage is now abstracted through virtual appliance with standard protocols; network services are packed into virtual appliance as well. All these allow them to be deployed on demand in a scalable way.
The latest OpenFlow technology in networking tries to virtualize deeper into networking fabric and has a great potential to revolutionize data center in coming years.
Moving beyond virtualization is of course cloud computing, which is mainly for IT users. For IT operators, low level management works, like configuration, monitoring, are still there. They are paid to do the work so that the IT users don’t have to. The programmable data centers are targeted for IT operators.
To build programmable data centers, the following are needed:
- Abstraction of virtual resources from their physical counterparts. At the very bottom, physical resources are still needed. But their abstracted layers are so flexible and fluid that you can dynamically transform them among compute, storage, or networking on demand.
- A set of programming interfaces to the underlying infrastructure, be it a command line or APIs. These interfaces are not only for issuing commands but also for querying system status. They help two way communications. Ideally we have a data center 100% virtualized. This may take time before it happens. In some uncommon use cases, it may not happen at all. If that is the case, programming interfaces to manage the physical resources are still needed.
I can see many benefits making data centers programmable. The most obvious is that you can automate everything, more specifically:
- Unify resources across compute, storage, and networking so that they can be shared across the traditional boundaries.
- Create multiple virtual data centers within a physical data center. With programmability, it can be done in a matter of minutes or hours. Unlike sharing up level resources, the boundaries are enforced strictly by low level system, therefore more secure for multi-tenancy
- Re-allocate system resources within a data center. You don’t need to move a physical machine or re-wire it to a different switch or attach to a different storage. All you need is several lines of commands or scripts.
- Isolate problematic components in a data center. What if a network switch is broken? No problem, you just “re-wire” it with a couple of commands.
Given these, I believe programmable data centers are the next big things for IT infrastructure and cloud computing.