Weeks ago I had a great conversation with Vanessa Alvarez (@VanessaAlvarez1) who is an analyst with Forrest Research. Among other topics, we discussed datacenter automation because we’re both interested in it. After Vanessa tweeted about her automation dream, several follow-up tweets came up.
In general, I think automation is a vague word in IT world, and it mostly means different things to different people. This is especially true when we talk about automation together with integration. This article tries to define automation from my understanding and perspective. Please feel free to share your thoughts in comments.
Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.
From high level, automation is the opposite to manual process. Automation does not mean the work going away, but that a machine (computer) does it for you. There are two major types of automations.
Active automation is initiated by a person who replaces repeated manual process with scripts or GUIs. A manual process may consist of many steps involving interacting with GUI, monitoring events, reading logs, etc. It’s OK to go through a manual process once in a while, but definitely not to repeat it over and over.
For example, you can power on a virtual machine with vSphere Client with one click. How about powering on 10 virtual machines with 10 clicks? Maybe. How about hundreds of virtual machines with hundreds of clicks? Probably not. How about thousands of virtual machines with thousands of clicks? Definitely not. In these cases, you want to have a script with a simple loop doing all for you. Your desire to automation increases with the number of repeats.
Alternatively, you can have a GUI that allows you to power on hundreds of virtual machines with one click. But then the GUI might be so cluttered with so many other possible actions that it becomes practically not usable. To make GUI an practical alternative, you mostly have to limit it to a particular aspect, for example, VM provisioning, etc. When that happens, you may call it something else other than automation.
Active automation is not limited to simple manual steps but also complicated processes like automatically configure and set up a training lab, which requires some extra intelligence in scripts. If there are too many variations and configurations, you may come up with a GUI tool to manage the automation process. When you have the GUI, again, it can be called anything, say lab manager, but automation.
Even if all you need to do is to run scripts or tools. You still need to decide: which script to run, under what conditions, and when. How about fully automated it?
Here comes with the second category of automation: reactive automation, which is to respond to events reactively, either internal or external. You will need a platform to closely monitor events in a system and apply appropriate scripts accordingly.
The internal events are easy to understand – they are things happening inside a system boundary. For a data center, it could be death of a hard disk, crashing of a virtual machine, etc. You can hook up scripts to these events.
The external events can be timer events or outside events. The timer events are very useful for system monitoring and preventive maintenance. For example, you can set up a timer that triggers once a day to check security status.
Writing scripts is sometimes intimidating to some, if not most, people. Therefore an alternative way is to specify policies. The policies are basically a set of rules specifying what to do upon what conditions. They are higher level of rules that are easy to understand and author. You can also design a GUI policy editor to further reduce the learning curve.
No matter how you design a policy, under the hood are still either execution engine or scripts. If you are familiar with DSL, it’s fair to think policy as a type of DSL. You can design a DSL to write automation rules.
Although you can automate a fairly amount of work, you cannot automate everything all the time for sure. It’s limited by many different factors.
The first limit comes from observability. The idea is simple: if you cannot feel/measure something, you cannot do much about it. Same is true for automation. It’s critical for an automation platform to “see” everything in the system and have a full event system. Some of the events may be from computer itself, some may be from non-computer systems like cooling, depending on your system boundary to be discussed soon.
The second limit comes from programmability. The actions to be taken have to be programs one way or the other; otherwise automation cannot do much for a change. You can let automation platform to alert a human being to make changes once in a while but definitely not efficient and not good for full automation.
The third limit is system boundary. Every system including automation system has a boundary. You cannot do much for things happening outside the system. For example, your automation system may detect a hard disk failure and then what? For more to happen, for example place an order for new disks, you have to have APIs to external systems. In general, you should not expect automating things outside the system of your focus.
Besides these technical limits, there could be other limits from economical and political aspects. For example, you can automate a process but the saving may not offset the cost for new software and training given the scale of current system.
I will discuss other subtopics like automation API/framework/engine, integration in future articles. Stay tuned.