Fundamentals of vSphere Performance Management

Performance monitoring is a critical aspect of vSphere administration. This article introduces you the basic concepts and terminologies in vSphere performance management, for example, performance counters, performance metrics, real time vs historical statistics, etc. Much of the content is based on my book VMware VI and vSphere SDK by Prentice Hall.

Once you understand these basics, the related tools and APIs should be relatively easy. If you are already familiar with vSphere Client performance monitoring or esxtop, they help as well.

Bothered by SLOW Web UI to manage vSphere? Want to manage ALL your VMware vCenters, AWS, Azure, Openstack, container behind a SINGLE pane of glass? Want to search, analyze, report, visualize VMs, hosts, networks, datastores, events as easily as Google the Web? Find out more about vSearch 3.0: the search engine for all your private and public clouds.

Performance Counter

A performance counter is a unit of information that can be collected about a managed entity. PerfCounterInfo data object, shown in Figure 1, represents a performance counter. The property key is an integer that uniquely identifies a performance counter, like a primary key of a table in SQL database, and nothing more. There is no guarantee for a performance counter to have a fixed number. In fact, the same performance counter can have different values in ESX and VirtualCenter. Even for the same type of server, the number could change from version to version. Do not use it outside the context of the server you connect to.

Figure 1 PerfCounterInfo data object

The performance counter can be represented by the following dotted string notation:


One sample for such expression is like the following:


This is the performance counter for the average usage of a disk.

In VI SDK 2.5, there are seven pre-defined groups of performance counters: CPU, ResCpu, Memory, Network, Disk, System, and ClusterServices. Inside different groups, there are different counters. For example, the system group has uptime, resourceCpuUsage, and heartbeat counters.

Rollup refers to the process of aggregating statistics so that they can be used in a later time. There are seven rollup types total as defined in PerfSummaryType enumeration type: average, latest, maximum, minimum, none, and summation. Each rollup type represents a different mathematic aspect of the same performance data. You can choose the rollups based on your interests. For example, if you are developing a charge-back solution, you might be more interested in the summation than any other type.

The performance counters are not simple permutation of these three dimensions. Some counters may not have all the rollup types. For example, the system.uptime has only summation type, no other six rollup types.

Moreover, a performance counter also contains other information about the unit, type of statistics and description, level, etc. The available units are listed in PerformanceManagerUnit enumeration type. The type of statistic is listed in PerfStatsType enumeration type. Both of these two enumeration types are included in Figure 1.

The level of a performance counter is an integer valued from 1 to 4, indicating its importance. The lower the level, the more important it is, the more likely it is collected, and the longer it is kept in the VirtualCenter database.

Here are a list of four levels and what counters are included:

  • Level 1:  includes basic metrics: average usage for CPU, memory, disk, and network; system uptime, system heartbeat, and DRS metrics. It does not include statistics for any device.
  • Level 2: includes all counters with rollup types of average, summation, and latest for CPU, memory, disk, and network; system uptime, system Heartbeat, and DRS metrics. It does not include any statistics for device either.
  • Level 3: includes all metrics (including device metrics) for all counter groups except these with rollup types of maximum and minimum rollup types.
  • Level 4: includes all metrics supported by VirtualCenter, including maximum and minimum rollup types.

Invoking queryPerfCounterByLevel() method can easily get you a list of performance counters in a specific level. On an ESX server, the level is not set for performance counter; likewise, the queryPerfCounterByLevel() is not supported.

All performance counters have their own meanings. When used effectively, they can provide good insight into the system performance. For example, when used CPU time approximates ready time, it may signal contention and possible overcommitment due to workload variability. vSphere API does not help to interpret the performance statistics, check out VMware technotes on performance for more details.

Performance Metric

The performance metric represents the actual information being collected. The counter defines only about the type of performance statistic, has not taken into account the target device instances. There might be multiple instances of the device for which the same performance counter can be used. Each combination of performance counter and device instance is a performance metric. The relationship of the performance counter and the performance metric is very much like that of the class and object instance in object oriented programming.

Let us take a look at a quick example. The cpu.usage.average is a performance counter for average CPU utilization. When the counter is collected on CPU No. 1 of a host, a performance metric is formed.

The performance metric is represented by PerfMetricId data object which consists of two parts:

  • counterId: The integer that identifies the performance counter.
  • instanceId: The name of the instance such as “vmnic1” or “vmhba0:0:0”.


Once it’s clear as which aspect of a device to collect performance data, you need to decide the interval with which the performance data is collected and stored. The interval has to be longer than the sampling interval, which can be found as refreshRate in the PerfProviderSummary data object returned by queryPerfProviderSummary() method, normally 20 second.

Given the constraints of storage, you don’t want to save all the sampled statistics as collected especially when the statistics getting older. The more recent ones are normally stored in a finer grain. When the data gets older, you combine them into longer intervals.

PerfInterval is the data object that represents a historical interval as shown in Figure 2.

Figure 2 The PerfInterval data object

Historical intervals are identified by an interval ID, the number of seconds for which the performance statistics are calculated. For example, for the 30 minute interval, the interval ID is 1800 (60×30).

Each configured interval has a name, e.g. “PastDay”, “PastMonth”, provided by users. The name does not affect system behavior. The configuration of historical intervals in vCenter specifies the scheme which is used to aggregate performance statistics data in vCenter.

Table 1 lists the default configuration of historical intervals at VC server as documented in VI SDK API reference. The default levels are subject to change, and you can modify them in the VI Client connecting to the VirtualCenter by clicking Administration -> VirtualCenter Management Server Configuration. Select the Statistics from the left side list, and change these on the right side panel. Right underneath the configuration is the database size part, which shows how much data it uses in database with the change.

Table 1

The default historical intervals defined in VC server

Name Sampling Period Length Level Enabled
PastDay 300 (5 min) 86400 (1 day) 4 TRUE
PastWeek 1800 (30 min) 604800 (1 week) 4 TRUE
PastMonth 7200 (2 hour) 2592000 (30 days) 2 TRUE
PastYear 86400 (1 day) 31536000 (365 days) 2 TRUE

Under the default settings, a vCenter server keeps all performance statistics counters (level 4 and above) at 5 minute interval for the past day, and 30 minute interval for the past week. After one week only counters at level 2 are stored at 2 hour interval for the past month and 1 day interval for the past year. All performance data older than one year are removed from the vCenter database.

In ESX, there is only one historical interval “PastDay”, similar to the one in VirtualCenter except that you actually have a length of 129600 (1.5 days) and level is not set. Since ESX is the source of many performance statistics for VirtualCenter, longer history can help to guide against performance data loss caused by various issues.

As of SDK 2.5, you should neither create a new performance interval nor delete an existing one. You can change the existing intervals to some extent. The rule is a little complicated. In general, you should avoid changing the intervals as much as possible except the levels.

Real-time Versus Historical Performance Statistics

There are two categories of performance data in the system. One is the raw performance samples collected at a pretty fast pace, for example every 20 seconds for a new sample. The interval can be found using the queryPerfProviderSummary() method and could vary from managed entity to managed entity. You cannot retrieve performance data more frequent than the real time samples.

Given the fast pace, there is a one hour time window to limit the total number of samples. When a new sample comes in, the oldest is removed.

The real-time samples are processed on a regular basis to generate the historical performance statistics with different intervals defined in PerfInterval. On ESX, only 5 minute interval statistics is supported, while on VirtualCenter 4 different intervals, as listed in Table 1, are pre-configured.

Both the historical statistics and realtime samples can be retrieved using the same interfaces but with different combinations of arguments. You can check out the samples using these APIs from the vSphere Java API‘s code repository.

To be notified for future posts, feel free to subscribe to this feed, and follow me at Twitter.

This entry was posted in vSphere API and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Damien
    Posted November 23, 2010 at 10:42 am | Permalink

    Hi Steve,

    Following your requirement, i still have some questions about the PerformanceManager :
    – Does it seen as an instance of vCenter, and request the vCenter DB as GUI do, or is it a different instance and have its own policy to get the metrics ?
    I mean, if you have 3 instances of PerformanceManager in 3 different Java client, would they ask the vCenter DB or get the metrics by asking ESX and so on ?

    I hope you will understand what i mean, because my english is not that good.



  2. Posted November 23, 2010 at 12:17 pm | Permalink

    It’s the same vCenter instance as you see with the GUI. Even you have many clients, there will be one PerformanceManager in vCenter. BTW, your English is pretty good and I don’t have problem to understand you at all.


  3. Damien
    Posted November 26, 2010 at 4:14 am | Permalink

    Hi Steve,
    Another question, how data are represented ? ie, you have 2 measure points T1 and T2 (T2=T1+20s) and the period is P=20s :
    For real time, is it instant data, collected at T1 (then T2), or is it an average on P ?
    Are historical data calculated by an average on a period too ?


  4. Feng
    Posted September 24, 2012 at 10:22 pm | Permalink

    Hi Steve,
    1.In ESX, there is only one historical interval “PastDay”, having a length of 129600 (1.5 days)
    2.Given the fast pace, there is a one hour time window to limit the total number of samples.
    3.On ESX, only 5 minute interval statistics is supported.
    Above three points, I still very confused.I did a test ,no matter how big I set the sample “qSpec.setMaxSample(maxSample);”,it just show one hour’s data.So I think the the second point is about this,what about the other 2.Thanks

  5. Feng
    Posted September 24, 2012 at 10:25 pm | Permalink

    added,It just aim to ESX.

  6. tom
    Posted February 17, 2014 at 3:21 pm | Permalink

    When you invoke queryPerf(), you pass in an array of PerfQuerySpec, which in turn contains an array of PerfMetricId. The results that come back from queryPerf() are an array of PerfEntityMetricBase.

    My question is, how can you correlate the return value PerfEntityMetricBase back to the PerfMetricId you supply? The reason I ask is, I would prefer to bundle calls to queryPerf() in batch rather than ask for each metric one at a time.

    Is this possible? I don’t see a way to do retrieve the PerfMetricId that a PerfEntityMetricBase corresponds to.


  7. Posted February 19, 2014 at 12:14 pm | Permalink

    In the returned object PerfMetricSeries, you can see PerfMetricId object, with which you can correlate back to the one in your spec parameter for queryPerf.


  8. Jonathan
    Posted January 15, 2015 at 7:50 am | Permalink

    Hi Steve,

    I have a few questions:
    – is it possible to retrieve the realtime performance data (sampled every 20 sec) in a one or five minute interval, and get all realtime values with timestamps from this interval, so that i can write all the values to a csv file without requesting the realtime every 20 sec from the vCenter-Server?
    If this will work, how can I do it?

    – A question to the event handling with the SDK. Is it possible send a message to a monitoring software when a fault event is thrown in the vCenter? A message which is communicated through the webservice of the SDK and not via SNMP-Trap?


  9. Posted January 15, 2015 at 4:43 pm | Permalink

    Hi Jonathan,

    1) I think it’s doable. Check my book for the samples there.
    2) Yes. You have to write code to monitor the events you are interested in.


Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.