Which measures contribute to host health?

Individual Host pages show problem history, event history, and related processes for each host. To assess health, the following performance metrics are captured for each host and presented on each Host page:

  • CPU
  • Memory
  • Disk (storage health)
  • NIC (network health)
Host page

What’s factored into host CPU health?

CPU usage is the primary measurement used to calculate CPU health. This is the percentage of time that the CPU was busy processing (i.e., not idle). This percentage is computed over all available CPU cores and scaled to a range of 0–100%.

The same calculation method is used for total CPU usage in a system as well as usage per process-group. This means that a process group composed of a single threaded process on a 4-core system will reach maximum CPU usage at 25%.
The CPU usage metric is used to generate high CPU measurements for host incidents.

Additional performance measurements for virtualized hosts

Virtualized hosts show additional measurements related to CPU performance. These values are important to overall virtual machine health.

CPU usage

Actively used CPU of the host as a percentage of total available CPU.

CPU Ready time

The value of CPU Ready time is the percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU.

CPU Ready time should remain below 10%. A CPU Ready time measurement of over 10% indicates that your virtual machines are competing for available resources and a virtual machine is unable to execute all of its tasks. Such contention can lead to a drop in application performance.

For more information on this, see How does virtual machine migration affect performance?

Physical CPU

The amount of actively used virtual CPU as a percentage of total available CPU.

This is the host view of CPU usage, not the guest operating system view. It is the average CPU utilization over all available virtual CPUs on the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, then you know that the virtual machine is utilizing 100% of one physical CPU’s available resources.

What’s included in host Memory health?

Host pages include two memory-related metrics for your hosts, Memory used and Page faults. Both measurements and other factors, are used to correlate and calculate host high memory incidents.

Memory used

Percentage of total RAM used by processes. RAM used by system caches and buffers is not included in this metric.

Page faults

Number of major page faults per second. Major page faults involve loading a memory page from disk and thus add disk latency to the interrupted program’s execution.

Additional performance measurements for virtualized hosts

Virtualized hosts will show additional measurements related to virtual machine memory use. These metrics along with other measurements are used to detect memory saturation incidents.

Memory compressed

Dynatrace shows the rate of memory compression or decompression.

Virtual machine management platforms use memory compression to reduce memory usage. Memory compression saves memory but requires additional CPU cycles. Memory content that had been previously compressed must be decompressed before it can be used by virtual machine.

Memory swapped

Rate at which memory is swapped from disk into active memory or the other way round, from  active memory to disk.

What’s included in host Disk health?

Throughput

The total number of bytes read and written to the disk per second.

IOPS

I/O (input/output) operations per second. Operations are counted after operations addressing adjacent disk sectors are merged.

Disk latency

Time from I/O request submission to I/O request completion. The average delay of disk read and write operations in milliseconds.
This metric is used for detecting host slow disk incident.

Disk space usage

This value tells you how much space is available.

What’s included in host NIC health?

Traffic

The average rate at which data was transmitted during the interval.

Packets

The number of received and sent packets over the host network interface during the interval.

Quality

The assessment of the number of dropped packets and errors.

Connectivity

Percentage of properly established TCP connections compared to TCP connections that were refused or timed out.

Note: The Connectivity measure can be used as an indicator of whether or not there’s network traffic on a host. Please note however that 0% connectivity doesn’t necessarily indicate that there is a problem with a host. Assuming no TCP errors are present, it may simply mean that no users have attempted to connect to the host process during the selected time frame.